dianna-ai / dianna

Deep Insight And Neural Network Analysis
https://dianna.readthedocs.io
Apache License 2.0
48 stars 13 forks source link

torchtext end-of-life and broken #829

Closed egpbos closed 1 month ago

egpbos commented 4 months ago

As noted in #827 and other recent PRs with breaking CI workflows, the torchtext package seems to be breaking down. It is no longer being developed (see https://github.com/pytorch/text/issues/2250).

Some options:

  1. Find a workaround ourselves.
  2. Look for a fork that is still maintained and switch to that.
  3. Replace torchtext as a dependency.

Option 3 seems the most attractive to me, naively, but I haven't looked deeply into how unique the functionality is that we use. We only use torchtext in two ways:

Can these easily be replaced? If not, a fourth option presents itself:

  1. Cannibalize torchtext for these parts only.
loostrum commented 4 months ago

Yikes, I guess we should have followed up on the deprecation warning earlier ;) I'll see if we can replace torchtext. In the linked pytorch issue they also suggest taking from the torchtext source what you need, so option 4. may not be the worst idea, we can just put them in a dianna text utils file.

cwmeijer commented 3 months ago

Spacy, a dependency of DIANNA already, may provide both functionalities. Looking at their code, it seems so. I came across a vocab.Vectors and a get_tokenizer in their code. That would be a nice solution I think (that's an option 3 for ya 😉).

SarahAlidoost commented 3 months ago

As a temporary fix to get CI green again, torch version is pinned in #841. Once this issue is resolved, the temporary fix can be removed.