explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.7k stars 4.36k forks source link

Text classification now failing spacy 2.1.3 Windows10 #3727

Closed pythonBerg closed 5 years ago

pythonBerg commented 5 years ago

Classifier code in use for 2+ years. Run just last week using v2.0.12. Now when attempting to run with newly updated modules, receive error below. Don't see anything obvious from change guide to 2.1... Not attempting to use GPU because of ongoing issues...

Using 2500 examples (1492 training, 373 evaluation)
Training the model...
LOSS      P       R       F
Traceback (most recent call last):
  File ".\docClassTrainer.py", line 116, in <module>
    nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
  File "C:\Users\MPC-LAPTOP01\AppData\Local\Programs\Python\Python36\lib\site-packages\spacy\language.py", line 452, in update
    proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
  File "pipes.pyx", line 931, in spacy.pipeline.pipes.TextCategorizer.update
  File "C:\Users\MPC-LAPTOP01\AppData\Local\Programs\Python\Python36\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 46, in begin_update
    X, inc_layer_grad = layer.begin_update(X, drop=drop)
  File "C:\Users\MPC-LAPTOP01\AppData\Local\Programs\Python\Python36\lib\site-packages\thinc\api.py", line 132, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\MPC-LAPTOP01\AppData\Local\Programs\Python\Python36\lib\site-packages\thinc\api.py", line 132, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "C:\Users\MPC-LAPTOP01\AppData\Local\Programs\Python\Python36\lib\site-packages\thinc\api.py", line 225, in wrap
    output = func(*args, **kwargs)
  File "C:\Users\MPC-LAPTOP01\AppData\Local\Programs\Python\Python36\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 46, in begin_update
    X, inc_layer_grad = layer.begin_update(X, drop=drop)
  File "C:\Users\MPC-LAPTOP01\AppData\Local\Programs\Python\Python36\lib\site-packages\thinc\api.py", line 275, in begin_update
    return layer.ops.unflatten(X, lengths, pad=pad), finish_update
  File "ops.pyx", line 138, in thinc.neural.ops.Ops.unflatten
AssertionError

Nothing new about code:

useCats=set(traincats)
random.shuffle(train_data)
train_data = train_data[-limit:]
texts, labels = zip(*train_data)
cats=[]
for lab in labels:
    cd = {}
    for cat in useCats:
        cd[cat] = bool(cat==lab)
    cats.append(cd)

split = int(len(train_data) * split)
train_texts=texts[:split]
train_cats = cats[:split]
dev_texts=texts[split:]
dev_cats=cats[split:]
for lab in useCats:
    textcat.add_label(lab)

print("Using {} examples ({} training, {} evaluation)"
      .format(n_texts, len(train_texts), len(dev_texts)))
train_data = list(zip(train_texts,
                      [{'cats': cats} for cats in train_cats]))

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes):  # only train textcat
    optimizer = nlp.begin_training()
    print("Training the model...")
    print('{:^5}\t{:^5}\t{:^5}\t{:^5}'.format('LOSS', 'P', 'R', 'F'))
    for i in range(n_iter):
        losses = {}
        batches = minibatch(train_data, size=compounding(4., 32., 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
                       losses=losses)
        with textcat.model.use_params(optimizer.averages):
            scores = evaluate(nlp.tokenizer, textcat, dev_texts, dev_cats)
        print('{0:2d}{1:.3f}\t{2:.3f}\t{3:.3f}\t{4:.3f}'  # print a simple table
              .format(i,losses['textcat'], scores['textcat_p'],
                      scores['textcat_r'], scores['textcat_f']))
        if model is not None and i%5 == 0:
            output_dir = Path(model)
            if not output_dir.exists():
                output_dir.mkdir()
            nlp.to_disk(output_dir)
            print("Saved model # TODO: o", output_dir)

Your Environment

ines commented 5 years ago

Thanks for the report. This looks like the same issue with Thinc that was reported in #3607, so I'm merging the two issues.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.