BaderLab / saber

Saber is a deep-learning based tool for information extraction in the biomedical domain. Pull requests are welcome! Note: this is a work in progress. Many things are broken, and the codebase is not stable.
https://baderlab.github.io/saber/
MIT License
102 stars 17 forks source link

Switch token alignment to SpaCy #152

Open JohnGiorgi opened 5 years ago

JohnGiorgi commented 5 years ago

Currently, to align BERT tokens to original tokens (before BERT tokenization) we use some code I grabbed from the official BERT repo.

SpaCy has introduced functions specifically for aligning two tests tokenized with different tokenizers. Switch to this!