explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.69k stars 4.36k forks source link

TextCatCNN.v2 doesn't work with transformers #11968

Closed polm closed 1 year ago

polm commented 1 year ago

How to reproduce the behaviour

As brought up in #11925, if you use TextCatCNN.v2 with transformers you get an error like this:

ValueError: Cannot get dimension 'nI' for model 'linear': value unset

The issue is that when initializing the textcat, the linear layer in the model is resized once for each label added. When resized, the linear layer is detected as initialized, and is then re-initialized. However it's not actually initialized at that point, and is missing dimension information because of how transformers initialization works, so initialization fails.

For the time being, the workaround is to use TextCatCNN v1 or another architecture. The main difference in v2 is that it's resizable, so if you aren't using that particular feature performance shouldn't differ significantly.

Info about spaCy

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.