erre-quadro / spikex

SpikeX - SpaCy Pipes for Knowledge Extraction
Apache License 2.0
398 stars 28 forks source link

abbreviation difference from scispacy #2

Closed dakinggg closed 3 years ago

dakinggg commented 3 years ago

Hi! scispacy developer here. Could you share what changes you made to our abbreviation detector? I am curious what issues you encountered/fixed (obviously not bothered at all that you based yours off of ours).

paoloq commented 3 years ago

Hi! Thanks for asking, I really love your work with scispacy, it's impressive.

Actually, I didn't fix any issue on your abbreviation detector, at least not on purpose. Erre Quadro works on patents, which are very technical and heterogeneous documents, and I needed to have an our own abbreviation detector to be able to fix special cases as soon as encountered.

As main differences, I added:

There may be others, but would be minor.

On next updates, I would like to better handle:

Of course I welcome any contribution or feedback, and thanks again for being inspiring.

dakinggg commented 3 years ago

Thanks for the info and glad you like scispacy! If you do make changes that you think would be general improvements, we'd love to have a look at a PR and incorporate them! (I'll close this since it isn't really an issue persay)