allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.68k stars 225 forks source link

Conjunctions in noun phrases #451

Closed chrishmorris closed 1 year ago

chrishmorris commented 1 year ago

The sentence "I buy hamster and gerbil food" is parsed as shown below. Actually food is the correct direct object. Instead, the conj link should connect gerbil to its correct parent hamster, and hamster should have an amod link to food.

image

It is hard to assign conjunction correctly. But maybe it is useful to upscore the options where the two co-ordinated terms are of similar weight. It seems possible to recognise that "hamster and gerbil" is more likely than "hamster and [gerbil food]"

Scispacy is amazing, and gets the correct parsing with impressive frequency. This case is one of the more common errors I encounter - such constructions are common in the corpus.

Environment

spaCy version: 3.0.7 Platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.31 Python version: 3.9.5 Pipelines: en_core_sci_sm (0.4.0)

dakinggg commented 1 year ago

Thanks for the suggestion @chrishmorris! While your idea is reasonable, incorporating that human intuition into the dependency parsing model is quite difficult. See https://spacy.io/api/dependencyparser for more details on the dependency parsing model. A simpler way to incorporate this idea would be to add lots of examples of the form you describe to the training corpus. I will likely not be doing this for scispacy, but if you were to create your own corpus, I'd be happy to help you figure out how to use it in our training scripts to train your own model! And feel free to open another issue if you end up going down that route and would like some help.