allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.69k stars 226 forks source link

Entity linker sets redundant extensions on Span objects #488

Closed JohnGiorgi closed 1 year ago

JohnGiorgi commented 1 year ago

Hi!

I noticed that the current entity linking code sets two extensions on span objects:

https://github.com/allenai/scispacy/blob/a5276f1829cf716362b1f151f2b513b9a00bc01f/scispacy/linking.py#L83-L85

but they appear to store exactly the same KB IDs:

https://github.com/allenai/scispacy/blob/a5276f1829cf716362b1f151f2b513b9a00bc01f/scispacy/linking.py#L134-L135

Is this a workaround for something, or can it be removed? I need to serialize a bunch of docs to disk so I am after anything I can remove to make them smaller. Happy to open a PR if in fact one of these extensions on the Span can be dropped.

dakinggg commented 1 year ago

This is not needed for anything. It is there for backwards compat, as we changed the name at some point. Would be ok to remove now I think.

JohnGiorgi commented 1 year ago

Cool, PR'd it!