Closed fresejoerg closed 4 years ago
I'm not entirely sure, but I think this may be a bug related to using -
as a label because the parser (which is used underneath for both parser
and ner
) has some special cases related to -
to process labels like B-ORG
for NER.
Do you have the same results if you replace -
with a different dummy label like none
?
Thanks! I replaced -
with other
and haven't observed the issue again, so I think your intuition was correct.
Glad to hear it! I have to look into the details about how complicated it might be to fix the parser directly (since you really should be able to use any string label), but at least for now we should show an error when training data is loaded with -
in a dependency label to avoid this problem.
A side note: unless you are only processing single sentences with your model when it's in use, I would recommend against using --gold-preproc
. If you use this option, the parser won't learn to split sentences because it also splits the training data up into individual sentences while training and the parser never sees any sentence boundaries.
If you want to train with gold tokenization, then just remove the "raw"
texts from your training data (if you have them) and it will learn from the gold tokens without splitting up documents. (Gold tokenization and single-sentence training ended up grouped together in this option for specific kinds of parser evaluations, when separate options would been better.)
Thanks for the pointer regarding --gold-preproc
. I don't have the raw
string in my data and the inputs are relatively short, non-grammatical fragments (search queries, in particular), so I'm not concerned about sentence splitting at the moment. But I'll do some experiments to see how omitting --gold-preproc
affects LAS.
The original issue just re-emerged for me. A freshly trained version of my model is predicting dep
as a label. I verified that no -
labels are in my training or dev dataset. So it appears that the root cause is unrelated to special handling for this character.
Hmm, what is nlp.get_pipe("parser").labels
for your model? Are there any warnings/errors for your data with spacy debug-data
?
this is what I get from nlp.get_pipe("parser").labels
:
('ROOT',
'compound',
'containment',
'dep',
'feature',
'modification',
'other',
'proximity',
'quality')
which is a subset of all training labels (with the exception of 'dep'
, which is not in my training data.
Here's the output from debugging the training data:
============================= Dependency Parsing =============================
ℹ Found 260 sentences with an average length of 6.5 words.
⚠ The training data contains 1.06 sentences per document. When there
are very few documents containing more than one sentence, the parser will not
learn how to segment longer texts into sentences.
ℹ Found 3 nonprojective train sentences
ℹ 21 labels in train data
ℹ 27 labels in projectivized train data
'other' (567), 'ROOT' (256), 'compound' (209), 'modification' (171), 'feature'
(111), 'containment' (100), 'proximity' (97), 'quality' (56), 'destination'
(24), 'possession' (24), 'cuisine' (16), 'timing' (14), 'availability' (9),
'directional' (9), 'negation' (9), 'quantification' (7), 'origin' (7), 'pricing'
(6), 'tmode' (6), 'attachment' (2), 'distance' (1)
⚠ Low number of examples for label 'quantification' (7)
⚠ Low number of examples for label 'origin' (7)
⚠ Low number of examples for label 'availability' (9)
⚠ Low number of examples for label 'cuisine' (16)
⚠ Low number of examples for label 'pricing' (6)
⚠ Low number of examples for label 'timing' (14)
⚠ Low number of examples for label 'directional' (9)
⚠ Low number of examples for label 'negation' (9)
⚠ Low number of examples for label 'tmode' (6)
⚠ Low number of examples for label 'attachment' (2)
⚠ Low number of examples for label 'distance' (1)
⚠ Low number of examples for 6 labels in the projectivized dependency
trees used for training. You may want to projectivize labels such as punct
before training in order to improve parser performance.
⚠ Projectivized labels with low numbers of examples:
other||containment: 2 feature||containment: 1 containment||containment: 1
containment||compound: 1 other||other: 1 modification||other: 1
⚠ The following labels were found only in the train data:
feature||containment, timing, containment||containment, containment||compound,
modification||other, quantification, other||containment, other||other,
distance
To train a parser, your data should include at least 20 instances of each label.
⚠ Multiple root labels (ROOT, containment) found in training data.
spaCy's parser uses a single root label ROOT so this distinction will not be
available.
I noticed that one of our example scripts uses -
as a label in for a similar case without issues, so it must be something else.
I don't understand why debug-data
would show 21 labels but you don't end up with all of them in the model labels. How many training docs do you have? There was a minor issue where the parser peeked at the first 1000 examples instead of examining all of them when adding labels. This peeking is still in v2.2.4, but will be removed in v2.3.0 (to be released soon, change in #5456).
My sense is that the number of training examples is related to this issue, but probably not to the one you're referencing. The debug-data
output above is for a training data set with 260 examples. I've since added some examples and have re-trained the model with 344 training examples, but still well below 1,000.
I noticed that there were previously two labels (destination
and possession
) which didn't make it into the model but also did not receive a low number of examples
warning. These two labels are now in the model.
Unfortunately, adding more training data didn't prevent dep
from showing up in the model.
Looking at specific examples where the model actually predicts dep
, I noticed that that occurs in cases where either the correct label would have been one of those with a low number of examples
warning or in edge cases where even a human expert can't confidently assign the correct label.
I think I figured out what's going on. There's a minimum label frequency parameter with a default value of 30, which explains why some labels are missing in the model. You can lower this by passing the parameter min_action_freq
to Parser.begin_training
. debug-data
uses a cutoff of 20 instead of 30, which is confusing here.
The dep
label is coming from here as a backoff if there's no other good move:
spacy v2 has a number of parameters and defaults that are spread across the code and hard to track down. The rewrite of thinc for spacy v3 uses a much better config system where models can be saved with a complete config file and there shouldn't be as many frustrating issues with hidden defaults.
Thanks for sticking with this. I'm closing this issue as resolved.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
I am using the CLI training interface to train a custom tagger and parser. The dependency labels are a custom set of semantic labels. The training data is converted from conll format. I am not using the --base-model argument, so I believe I'm starting from a blank model. Also, the output directory does not exist prior to training. After training, the model sometimes outputs an unexpected dependency tag ('dep') which is not part of my training data.
Info about spaCy
This issue links to this stackoverflow question.