CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
121 stars 28 forks source link

Stanza models cannot download #52

Closed OscartGiles closed 4 months ago

OscartGiles commented 4 months ago

chemdataextractor2 cannot install this Stanza model:

https://github.com/CambridgeMolecularEngineering/chemdataextractor2/blob/c466967d34f5a4dcaf8c6b5a36ed1a117a399c7a/chemdataextractor/nlp/dependency.py#L31

This appears to be an issue with this version of Stanza, which is now out of date and has stopped working. Connections to nlp.stanford.edu timeout. It looks like they have previously had similar issues.

However, I haven't raised an issue with Stanza as newer versions of Stanza pull a model from hugging face, which works fine:

https://github.com/CambridgeMolecularEngineering/chemdataextractor2/blob/c466967d34f5a4dcaf8c6b5a36ed1a117a399c7a/requirements/production.txt#L26

OscartGiles commented 4 months ago

I will try and update the dependency and submit a PR if the tests pass

OscartGiles commented 4 months ago

Unfortunately, it is not possible to update to the latest stanza version because it doesn't support Python 3.6. I have a fork and I'll run the tests with just 3.8.

The latest version of stanza only supports Python 3.8 and above.

OscartGiles commented 4 months ago

Some more context on stanza https://github.com/stanfordnlp/stanza/issues/426#issuecomment-1025192535

OscartGiles commented 4 months ago

It looks like stanza==1.6.1 may support python 3.6 and be able to download the models.

OscartGiles commented 4 months ago

Their server is back online, so https://github.com/CambridgeMolecularEngineering/chemdataextractor2/pull/53 is not required to fix the package. But it might still be worth considering upgrading if their server is going down.

Dingyun-Huang commented 4 months ago

Hi Oscart, We are aware of this. Yes, nlp.stanford.edu was down yesterday and has come back online. Upgrading from python 3.6/3.8 to higher version of python is a massive task, because the NER system was built with allennlp which has reached its EOL. We are planning it.

OscartGiles commented 4 months ago

Ah yes, I was taking a look as the lack of wheels for python 3.8 makes installation quite slow. I noticed the next upgrade bumps a major version so didn't go any further.

I did submit https://github.com/CambridgeMolecularEngineering/chemdataextractor2/pull/53 which would still support Python 3.6, but gets the later version of Stanza without a major version increase.

But fair enough to ignore if you are planning a bigger upgrade. Good luck