dpalmasan / TRUNAJOD2.0

An easy-to-use library to extract indices from texts.
https://trunajod20.readthedocs.io/en/latest/
MIT License
29 stars 7 forks source link

Migrate models built in Spacy to use Stanford models #4

Open dpalmasan opened 4 years ago

dpalmasan commented 4 years ago

With the new release of stanza:

https://stanfordnlp.github.io/stanza/

Maybe it is a good opportunity to improve accuracy. The issue is about investigating if this could improve our accuracy and cost estimates of migration.

brucewlee commented 3 years ago

Thank you for open-sourcing this repo! It's helping a lot with my research.

Regrading Stanza migration, unless you have a tight deadline, I could help. However, I doubt the accuracy would improve by much. SpaCy had a major improvement quite recently https://spacy.io/usage/v3. But, of course, Stanza would look much better for research papers.

dpalmasan commented 3 years ago

Hello Bruce! Sure, I don't have a tight deadline, so your contribution is more than welcome! There are some differences in stanza pre-trained models compared to spacy ones, so maybe I am not sure if completely migrating it, but having the alternative of using stanza models instead of spacy might improve performance in some cases!

brucewlee commented 3 years ago

Oh, so do you mean adding an option to use Stanza? Hmm, I'm familiar with both Stanza and spaCy, but the biggest trouble for me would be dealing with Spanish texts. I only know Spanish at a very introductive level.

Anyways, I looked through Entity Grid and TTR features, which both seem to require minimal Spanish skills. I'll first create a pull request (in a few days) for these files. I'll try to add options to use Stanza rather than fully migrate to Stanza. One could then choose which to use.

dpalmasan commented 3 years ago

I mean, initially I wanted to completely replace spacy, but as you mentioned, spacy improved over time, so maybe removing all the spacy references will not be as good as having options for both stanza and spacy. No worries regarding Spanish related features. I can update them. BTW thanks for your desire to contribute!

brucewlee commented 3 years ago

No worries. I'm also working on a similar project so it'll help me too anyways :)