Open ahammel opened 9 years ago
I've made an ENBF parser using the simpleparse library. It's in the 'parser' branch.
Simpleparse doesn't support python3 directly, so we'll need to update the build process to do automatic dependency resolution and 2to3 conversion before we can integrate it.
Right now, each of the *Probe classes defines its own ad-hoc parser using a complicated regular expression. This results in a whole lot of duplicated code, not only in the parsers themselves, but because every probe class has duplicate logic for how to disambiguate amino acid sequences, the names of genes, globbed exons, etc.
It would be a lot better if we had a centralized, EBNF-style parser, *Probe classes that expect to be fed a parse tree, and a centralized disambiguator. These would be used something like this: