Better parsing - Githubissues

Right now, each of the *Probe classes defines its own ad-hoc parser using a complicated regular expression. This results in a whole lot of duplicated code, not only in the parsers themselves, but because every probe class has duplicate logic for how to disambiguate amino acid sequences, the names of genes, globbed exons, etc.

It would be a lot better if we had a centralized, EBNF-style parser, *Probe classes that expect to be fed a parse tree, and a centralized disambiguator. These would be used something like this:

class SomeKindOfProbe(AbstractProbe):
    def __init__(self, statement):
        # 'statement' is a data structure containing all the information in a
        # probe statement, including the comments
        self.statement = statement
        self.variant = self._make_variant(self.statement)

    @staticmethod
    def explode(self, statement, annotation):
        parse_tree = Parser.parse(statement)
        for parsed_statement in Disambiguoator.disambiguate(
                parse_tree, annotation):
            return SomeKindOfProbe(parsed_statement)

    def _make_variant(self):
        # This method contains the logic which produces a Variant given the
        # infomation in a disambiguated statement.

bcgsc / ProbeGenerator

Better parsing #1