This PR updates the alignment to rely on matching unique IDs (specifically, Entrezgene) as opposed to fuzzy string matching. This dramatically improved the quality of alignments.
There is a lot of functionality that was removed, most notably, extending annotations with scispacy. I will add the scispacy stuff back in a subsequent PR, because it does reduce the number of PMIDs we have to drop due to a missed alignment.
This PR updates the alignment to rely on matching unique IDs (specifically, Entrezgene) as opposed to fuzzy string matching. This dramatically improved the quality of alignments.
There is a lot of functionality that was removed, most notably, extending annotations with scispacy. I will add the scispacy stuff back in a subsequent PR, because it does reduce the number of PMIDs we have to drop due to a missed alignment.