explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.16k stars 4.4k forks source link

add_label doesn't add extra_label to ner model in Spacy 2.1.9 #4751

Closed ghost closed 4 years ago

ghost commented 4 years ago

When adding label for existing custom trained named entity model, it doesnt add labels.


def main(model=<model_path>,
         output_dir=Path(<output_path>), n_iter=100):
    spacy.util.use_gpu(0)
    if model is not None:
        nlp = spacy.load(model)
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("tr")
        print("Created blank 'en' model")
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner, last=True)

    else:
        ner = nlp.get_pipe("ner")

    ner.add_label("LOCATION")
    ner.add_label("MISC")
    ner.add_label("PERSON")
    ner.add_label("ORGANIZATION")

    optimizer = nlp.resume_training()
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
    with nlp.disable_pipes(*other_pipes):
        for itn in range(n_iter):

            random.shuffle(TRAIN_DATA)
            losses = {}
            batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(
                    texts,
                    annotations,
                    drop=0.25,
                    losses=losses,
                    sgd=optimizer
                )
            print("Losses", losses)

I trained model with spacy cli

python3 -m spacy train tr [output_path] [train_path] [dev_path] --n=200

After train, when ı looked to ner/cfg file { "beam_width":1, "beam_density":0.0, "cnn_maxout_pieces":3, "deprecation_fixes":{ "vectors_name":"spacy_pretrained_vectors" }, "beam_update_prob":1.0, "nr_class":17, "hidden_depth":1, "token_vector_width":96, "hidden_width":64, "maxout_pieces":2, "pretrained_vectors":null, "bilstm_depth":0 }

First model was 2.0.16 version of Spacy and its work well. When upgrade to 2.1.9, ı have problem about that. { "beam_width":1, "beam_density":0.0, "cnn_maxout_pieces":3, "deprecation_fixes":{ "vectors_name":"spacy_pretrained_vectors" }, "nr_class":1, "hidden_depth":1, "token_vector_width":128, "hidden_width":200, "maxout_pieces":2, "pretrained_vectors":null, "hist_size":0, "hist_width":0, "extra_labels":[ "PERSON", "LOCATION", "MISC", "ORGANIZATION" ] } Thanks.

adrianeboyd commented 4 years ago

Hi, I think extra_labels is an old setting that isn't used anymore.

spacy train should automatically add any new labels it sees in the training data. If you load your trained model, you can see the labels with nlp.entity.labels and you can also find a list in meta.json in the saved model.

ghost commented 4 years ago

Hello, thanks for response.

I thought train problem for about that but ı little bit search more. Now ı use cli-train and it works! I think problem was my parameters.

When use manuel training, these labels doesnt show on meta.json but cli-train is okay.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.