Closed david-waterworth closed 3 months ago
Hi @david-waterworth, generally agree that we have too many dependencies.
@helpmefindaname did an analysis of all dependencies and identified a core set of dependencies plus optional ones, also wanting to get rid of gensim. He proposed a solution essentially exactly like you, e.g. using something like pip install flair[gensim]
to install bigger libraries.
My main problem is that gensim is used in some of our main taggers (for instance 'ner') and then this starter example would not work anymore: https://flairnlp.github.io/docs/intro#requirements-and-installation (installation plus example 1).
The example appears to work for me without using gensim features? My assumption is gensim is used by the WordEmbeddings
class in flair/embeddings/token.py
(specifically gensim.models.KeyedVectors
to load word2vec serialized files). My testing was a bit ad-hoc though so I could very well have missed something.
But tagger = Classifier.load('ner')
appears to load a serialised pytorch state dict so (based on stepping through the code and putting breakpoints on lines that appear to use gensim) it seemed two work.
PS as an aside a transformer encoder fined tuned on my corpus using FLERT works extremely well!
I had the same error with gradio, I downgraded the versionn of gradio but still it was using urllib >= 2.0
.
The conflict is caused by:
gradio 4.37.2 depends on urllib3~=2.0
flair 0.13.1 depends on urllib3<2.0.0 and >=1.0.0
At the end my solution was this:
pip install --use-deprecated=legacy-resolver -r requirements.txt
But it's deprecated and soon pip can remove this feature without notifying it and I'm not sure how can I install both of them together.
@helpmefindaname I don't think those issues are really problems that should be fixed this way - my understanding of the issue is for packages that don't include a manifest as a separate file, poetry has to download and extract the wheel, and it does it once only and for many packages (unless you clear the cache), not just urllib. It's annoying the first time but users could always pin urllib3 in their project. From memory the main issue is boto3 doesn't support the manifest format and there's tons of different wheels which initially triggered poetry to download every boto3 wheel to check dependencies when urllib3 > 2.0.0 was released.
Problem statement
flair requires
urllib3<2.0.0
which is quite old (latest is 2.2, the 2.0 alpha was released in late 2022)https://github.com/flairNLP/flair/blob/59bd7053a1a73da03293b0bdb68113cf383d6b9e/requirements.txt#L26C1-L26C75
I have other dependencies that require urllib3 >= 2.0 (i.e.
tritonclient[http] (>=2.47.0,<3.0.0) requires urllib3 (>=2.0.7)
)Solution
I'm not sure that flair directly uses
urllib3
, I suspect the require was added as per the comment due to 3rd party libraries not requiring urllib3 < 2 so resulting in a broken installation when urllib3 v2 was released. I would hope this is no longer a constraint and it can be removed?Additional Context
In general I think flair imports to much by default, i.e. I have to downgrade
scipy
because even though I'm not using anygensim
based features the fact that gensim uses deprecated scipy features that are now removed (triu) I get import errors (gensim has fixed this but not released the fix yet). Also note a recent "fix" added by gensim pinsnumpy < 2
which flair will inherit which could cause issues down the line.I think most 3rd party (gensim, spacy, huggingface) functionality should be "optional" - i.e.
pip install flair[gensim]
to enable gensim features. I'm not a big fan of getting errors due to flair importing gensim when I'm training a tagger with transformer embeddings.