explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.39k stars 4.42k forks source link

No entities retrieved by Custom Hindi NER model #4562

Closed adridjs closed 4 years ago

adridjs commented 5 years ago

Hello, I'm trying to detect entities with a custome NER spacy model. I trained it on wikipedia articles via CLI and it gave an F-Score of around 75%. Now I'm trying to test it with articles from the web, even with entities it has seen while training, but it is not detecting any, that is doc.ents is an empty tuple. I packaged it as the docs say and it loads without problem. I also trained a POS with a different dataset and it is working properly.

I also checked that the tags are there (LOC, PER and ORG) and it is a blank new model too, so the case of "catastrophic forgetting" is not applying. What am I missing on the NER model?

ines commented 5 years ago

Are you using spacy train on the command line? And if so, can you run the debug-data command and share the result here? (You can use the gold.docs_to_json helper to convert your annotations to spaCy's JSON format if needed.)

adridjs commented 5 years ago

Yes, here I attach the result of debug-data.

66893 training docs
8362 evaluation docs
âš  336 training examples also in evaluation data

============================== Vocab & Vectors ==============================
ℹ 8203231 total words in the data (338330 unique)
âš  217831 misaligned tokens in the training data
âš  27619 misaligned tokens in the dev data
ℹ No word vectors present in the model

========================== Named Entity Recognition ==========================
ℹ 3 new labels, 0 existing labels
0 missing values (tokens with '-' label)
✔ Good amount of examples for all labels
✔ Examples without occurrences available for all labels
✔ No entities consisting of or starting/ending with whitespace

=========================== Part-of-speech Tagging ===========================
ℹ 1 label in data (80 labels in tag map)
✘ Label '-' not found in tag map for language 'hi'

============================= Dependency Parsing =============================
ℹ Found 8010157 sentences with an average length of 1.0 words.
ℹ 2 labels in train data
ℹ 2 labels in projectivized train data

================================== Summary ==================================
✔ 4 checks passed
âš  3 warnings
✘ 1 error
adridjs commented 5 years ago

BUMP: I'm trying to merge models that have been trained with different data (NER and POS). I'm not aware if there's a way via CLI, but I'm having trouble with the NER part (still not able to detect any entity, even though the labels are there).

If only I could just at least make it work separately, I wouldn't mind to have 2 models instead of the merged one.

adrianeboyd commented 5 years ago

There's no way to combine models with the CLI yet. If the individual NER model is not finding any entities, it won't perform any differently when combined with the POS model, since the models work independently.

The output from debug-data looks fine and your description of the f-score while training sounds fine, so it's really hard to tell what might be going wrong when you're loading the trained model later. Are you sure that you are loading the correct NER model? Can you inspect meta.json in the directory you're loading to see if it contains the NER labels and evaluation results that you expect?

adridjs commented 5 years ago

I've been with this problem for a week now and I can't see where I'm faling at. Something weird is happening because with the same training command (only changing data and -p tagger,parser for POS works well... I attach here how the model instance looks like, in case you can spot anything weird. I've checked labels, morphology, class names, etc and it all looks OK, so I'm a little bit desperate on what it is happening.

'labels': OrderedDict([('ner', ['LOC', 'ORG', 'PER'])])
'pipeline': <class 'list'>: [('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7f5ef7a62348>)]
adrianeboyd commented 5 years ago

Hmm, that's not really much info. Can you show all of nlp.meta?

adridjs commented 5 years ago

Sure, here you have:

{
    'lang': 'hi', 
    'pipeline': ['ner'], 
    'spacy_version': '>=2.2.0', 
    'speed': {
        'nwords': 1024799,
        'cpu': 38024.8528217147, 
        'gpu': None
    }, 
    'accuracy': {
        'uas': 0.0, 
        'las': 0.0, 
        'ents_p': 67.2770456635, 
        'ents_r': 51.1058084041, 
        'ents_f': 58.0869220083, 
        'ents_per_type': {
            'LOC': {
                'p': 61.893349311, 
                'r': 54.6055239857,
                'f': 58.0214842379
            }, 
            'PER': {
                'p': 75.1215862327, 
                'r': 46.8447451301, 
                'f': 57.7052949206
            }, 
            'ORG': {
                'p': 61.4563106796, 
                'r': 60.7485604607, 
                'f': 61.1003861004
             }
         }, 
    'tags_acc': 0.0,
    'token_acc': 92.3854336314, 
    'textcat_score': 0.0, 
    'textcats_per_cat': {}
    }, 
    'vectors': {
        'width': 0, 
        'vectors': 0, 
        'keys': 0, 
        'name': 'spacy_pretrained_vectors'
    }, 
    'name': 'ner_only', 
    'version': '1.0.0', 
    'author': 'Adrián', 
    'email': '', 
    'url': '', 
    'license': 'CC-BY', 
    'description': 'Hindi NER', 
    'labels': OrderedDict([('ner', ['LOC', 'ORG', 'PER'])]), 'factories': {'ner': 'ner'}
}
svlandeg commented 4 years ago

Apologies for the late follow-up. I don't really see anything weird in your meta.json, other than the fact that you mentioned a 75% F-score and the file mentions 58%.

I wonder whether some sort of preprocessing is different between your training data (wikipedia articles) and your test data (articles from the web). To rule it out, I would suggest training your NER model on a very small piece of your training data - run it for some iterations and the model will start "overfitting". It will pretty much memorize the small training set. Then, store to disk and load back in, and test the model on the exact same sentences from the training dataset. If everything went fine, it should be able to predict these sentences with near-100% accuracy.

This is just a test to make sure all code is set up correctly. If this works, you can ofcourse extend your training set again. Then test on instances similar to some of your training data, just to see if the model works correctly. If all of that works, then perhaps try again with the test set from Wikipedia.

If in the meantime you've been able to solve this any other way - please feel free to let us know as well.

no-response[bot] commented 4 years ago

This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.