JohnGiorgi / seq2rel-ds

This is a companion repository to seq2rel (https://github.com/JohnGiorgi/seq2rel) which aims to make it easy to generate training data.
5 stars 1 forks source link

Swtich to entity linking #5

Closed JohnGiorgi closed 3 years ago

JohnGiorgi commented 3 years ago

This PR updates the alignment to rely on matching unique IDs (specifically, Entrezgene) as opposed to fuzzy string matching. This dramatically improved the quality of alignments.

There is a lot of functionality that was removed, most notably, extending annotations with scispacy. I will add the scispacy stuff back in a subsequent PR, because it does reduce the number of PMIDs we have to drop due to a missed alignment.