Closed egpbos closed 1 month ago
Yikes, I guess we should have followed up on the deprecation warning earlier ;) I'll see if we can replace torchtext. In the linked pytorch issue they also suggest taking from the torchtext source what you need, so option 4. may not be the worst idea, we can just put them in a dianna text utils file.
Spacy, a dependency of DIANNA already, may provide both functionalities. Looking at their code, it seems so. I came across a vocab.Vectors
and a get_tokenizer
in their code. That would be a nice solution I think (that's an option 3 for ya 😉).
As a temporary fix to get CI green again, torch version is pinned in #841. Once this issue is resolved, the temporary fix can be removed.
As noted in #827 and other recent PRs with breaking CI workflows, the torchtext package seems to be breaking down. It is no longer being developed (see https://github.com/pytorch/text/issues/2250).
Some options:
Option 3 seems the most attractive to me, naively, but I haven't looked deeply into how unique the functionality is that we use. We only use torchtext in two ways:
from torchtext.data import get_tokenizer
inutils/tokenizer.py
from torchtext.vocab import Vectors
intest/utils.py
, in a couple of notebooks and in the dashboard.Can these easily be replaced? If not, a fourth option presents itself: