explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.69k stars 4.36k forks source link

Cannot use BiLSTM encoder with transition based NER parser. [spacy-nightly] #6318

Closed AndriyMulyar closed 3 years ago

AndriyMulyar commented 3 years ago

I get the following error when attempting to use the BiLSTM encoder with the NER transition based parser:

ℹ Using CPU

=========================== Initializing pipeline ===========================
✘ Can't construct config: calling registry function
(build_Tok2Vec_model) failed
spacy.Tok2Vec.v1   "Cannot get dimension 'nO' for model 'with_padded(with_padded(pytorch))'"

{'model': {'@architectures': 'spacy.Tok2Vec.v1', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v1', 'width': 100, 'attrs': ['ORTH', 'SHAPE'], 'rows': [5000, 2500], 'include_static_vectors': 'True'}, 'encode': {'@architectures': 'spacy.TorchBiLSTMEncoder.v1', 'width': 100, 'depth': 2, 'dropout': 0.30000000000000004}}}

Could the pytorch BiLSTM wrapper encoder perhaps be missing a line like this (which appears in the CNN encoders): https://github.com/explosion/spaCy/blob/dc816bba9d564ae572af28a17cbf0580ba11db5e/spacy/ml/models/tok2vec.py#L274

https://github.com/explosion/spaCy/blob/dc816bba9d564ae572af28a17cbf0580ba11db5e/spacy/ml/models/tok2vec.py#L302

Also the docstring is not accurate (seems to be copied from the CNNs)

My model components look like this:

[components]

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v1"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v1"
width = ${components.tok2vec.model.encode.width}
attrs = ["ORTH", "SHAPE"]
rows = [5000, 2500]
include_static_vectors = True

[components.tok2vec.model.encode]
@architectures = "spacy.TorchBiLSTMEncoder.v1"
width = 100
depth = 2
dropout = ${training.dropout}

[components.ner]
factory = "ner"

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v1"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
svlandeg commented 3 years ago

Thanks for the report! That does look suspicious. Usually the nO dimension should be set by initializing the model with example input & output, so that it can infer the dimensions, but this may not work with the Torch encoder. You could be right that we need to set it specifically. I'll have a look!

svlandeg commented 3 years ago

This issue was a bit more involved than I had initially hoped, but I think https://github.com/explosion/spaCy/pull/6442 and https://github.com/explosion/thinc/pull/432 together should fix this. At least the training now runs for me when I replicate your config.

Thanks again for the detailed report!

[EDIT: it works when I set include_static_vectors to False and fails with a different error otherwise, but that's another issue that I'm looking into] [EDIT 2: Never mind the above, it works with static vectors as well when setting e.g. vectors = "en_core_web_lg"]

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.