explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.82k stars 4.37k forks source link

nlp.rehearse is failing with error `unhashable type: 'numpy.ndarray'` #7823

Closed pradeeptimeinc closed 3 years ago

pradeeptimeinc commented 3 years ago

Hi i'm using spacy to identify custom ner. and trying to train it in incremental way like train with x dataset/Examples save the model and then pass another y dataset and use prev trained model(trained on x) and train it on y and so on.

We tried it and facing isssue related to "catastrophic forgetting".

So i looked into nlp.rehearse and nlp.resume_training. And trying to use it. but nlp.rehearse is failing with error unhashable type: 'numpy.ndarray'

How to reproduce the behaviour

        with nlp.disable_pipes(*other_pipes):
          optimizer = nlp.resume_training() if is_model_exists else nlp.initialize()
          random.shuffle(annotated_items)
          losses = {}
          count = 0
          if(len(annotated_items) > 0):
            examples = []

            for text, entity_annotations in annotated_items:
              try:
                checkup_entity = offsets_to_biluo_tags(
                    nlp.make_doc(text.lower()), entity_annotations, "O")
                if ('-' not in checkup_entity):
                  examples.append(Example.from_dict(nlp.make_doc(
                      text), {'entities': entity_annotations}))
              except Exception as e:
                count = count+1
            losses = nlp.rehearse(examples, sgd=optimizer, losses=losses) if is_model_exists else ner.update(examples, sgd=optimizer, losses=losses)
            print(losses)

Error

is_model_exists True
unhashable type: 'numpy.ndarray'
Traceback (most recent call last):
  File "/Users/user/Documents/project/ner/custom-ner/src/handler2.py", line 138, in handler
    losses =  nlp.rehearse(examples, sgd=optimizer, losses=losses) if is_model_exists else ner.update(examples, sgd=optimizer, losses=losses)
  File "/Users/user/.local/share/virtualenvs/custom-ner-hlJnNjaT/lib/python3.7/site-packages/spacy/language.py", line 1174, in rehearse
    examples, sgd=get_grads, losses=losses, **component_cfg.get(name, {})
  File "spacy/pipeline/transition_parser.pyx", line 430, in spacy.pipeline.transition_parser.Parser.rehearse
  File "spacy/pipeline/trainable_pipe.pyx", line 246, in spacy.pipeline.trainable_pipe.TrainablePipe.finish_update
  File "/Users/user/.local/share/virtualenvs/custom-ner-hlJnNjaT/lib/python3.7/site-packages/thinc/model.py", line 325, in finish_update
    (node.id, name), node.get_param(name), node.get_grad(name)
  File "/Users/user/.local/share/virtualenvs/custom-ner-hlJnNjaT/lib/python3.7/site-packages/spacy/language.py", line 1164, in get_grads
    grads[key] = (W, dW)
TypeError: unhashable

Your Environment

I know nlp.rehearse and nlp.resume_training are still experimental but any help/solution is appreciated.

Thanks

polm commented 3 years ago

I think this is a duplicate of https://github.com/explosion/spaCy/issues/7161. Sounds like it's a real bug, but please note that rehearse / resume_training are currently marked experimental.

pradeeptimeinc commented 3 years ago

Yes, they look identical with the same error.

polm commented 3 years ago

OK, I'll close this issue so we can keep discussion in the same place. Please follow that issue for updates.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.