explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.16k stars 4.4k forks source link

No loss decrease in spacy-GPU training #3937

Closed pratapaprasanna closed 5 years ago

pratapaprasanna commented 5 years ago

Hi all, I have trained a NER model with CPU and i could see that the NER loss decreasing but when i try to use the same code but add add spacy.require_gpu() and try using my GPU the NER loss was stagnate after few epochs

But my loss was decreasing when i ran my training on CPU

    spacy.require_gpu()
    nlp = spacy.blank('en')
    ner = nlp.create_pipe('ner')
    sbd = nlp.create_pipe('sentencizer')
    nlp.add_pipe(sbd)
    nlp.add_pipe(ner, last=True)
    output_dir = config['TRAIN']['model_save_path']
    for _, annotations in train_data:
        for ent in annotations.get('entities'):
            ner.add_label(ent[2])
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):
        n_iter = int(config['TRAIN']['num_epochs'])
        optimizer = nlp.begin_training(n_threads=8, device=0)
        for itn in range(n_iter):
            if config['TRAIN']["shuffle_data"]:
                random.shuffle(train_data)
            losses = {}
            batches = minibatch(train_data, size=2048)

Did i miss anything and is there any approach to speed-up my training

Any help would be off great use

Thanks in advance.

Your Environment

honnibal commented 5 years ago

Not sure this is what's wrong, but looking at your code this stood out to me:

batches = minibatch(train_data, size=2048)

Is that batch size intended? It's very large.

alejandrojcastaneira commented 5 years ago

I'm having the similar behavior of the losses not decreasing, in our case instead of CPU-GPU difference but between spacy versions:

2.0.18: _Iteration 1 eval_f: 63.79 train_f: 65.34 loss: 15934.35 . . Iteration 30 eval_f: 70.02 trainf: 93.26 loss: 3748.60 best accuracy: 70.67 at iteration 19

2.1.4: _Iteration 1 eval_f: 58.36 train_f: 58.98 loss: 181387.64 . . Iteration 30 eval_f: 71.12 trainf: 80.04 loss: 69142.48 best accuracy: 71.12 at iteration 30 eval_f: f1 score in evaluation set trainf_f: f1 score in train set batch_size = (2-8) dropout = 0.2

The f1 score is higher in the 2.1.4 int the eval set than in 2.0.18 version, although the losses are higher and also the score on the train set, we are afraid if it could be under fitting.

kindly will like to know which are the main architecture differences between the models on both versions.

Best Regards

pratapaprasanna commented 5 years ago

Donno why but in my latest run it dropped for sometime and then it was stuck

iter:467         loss: {'ner': 93100.5539039111}
^[^[^[iter:468   loss: {'ner': 93206.59834370334}

for the last 100-150 iterations ..

alejandrojcastaneira commented 5 years ago

Sorry for being reiterative on this, but training on 2.0.18 the models keep showing better results that on 2.1.x, after trying multiple experiments:

Default Model

Iteration 1 - eval_f: 58.93 train_f: 58.56 losses: 168191.75 Iteration 30 - eval_f: 70.48 train_f: 79.51 losses: 70717.94

best accuracy: 70.48 at iteration 30

Model + WordVectors

iteration 1 eval_f: 58.36 train_f: 58.98 losses: 181387.64 iteration 30 eval_f: 71.12 train_f: 80.046 losses: 69142.48

best accuracy: 71.12 at iteration 30 losses: 69142.48

Model + Pretrained without vectors:

iteration 1 eval_f: 62.34 train_f: 63.07losses: 152334.11 iteration 30 eval_f: 71.03 train_f: 79.14 losses: 70682.21

best accuracy: 71.03 at iteration 30 losses: 70682.21

Model + Pretrained with vectors:

iteration 1 eval_f: 63.52 train_f: 64.51 losses: 156741.01 iteration 30 eval_f: 71.11 train_f: 80.21 losses: 67391.69

best accuracy: 71.43 at iteration 24

Model + WordVectors + BILSTM = 2

iteration 1 eval_f: 20.46 train_f: 20.67 losses: 218547.50 Iteration 30 eval_f: 70.44 train_f: 75.43 losses: 58840.90

best accuracy: eval_f: 70.44 at iteration 30

Model + Wordvectors + CNN Width 256 + CNN Depth 7 + Embedrows 7500

iteration 1 eval_f: 60.76 train_f: 61.71 losses: 175188.69 iteration 30 eval_f: 71.45 train_f: 87.96 losses: 53303.67

best accuracy: 71.54 at iteration 27

Although on 2.0.18:

Default Model

iteration 1 eval_f: 63.27 train_f: 65.29 losses: 16354.67 iteration 30 eval_f: 70.25 train_f: 93.36 losses: 3825.24

best accuracy: 70.51 at iteration 17

Model + WordVectors

iteration 1 eval_f: 63.79 train_f: 65.34 losses: 15934.35 iteration 30 eval_f: 70.02 train_f: 93.26 losses: 3748.6055

best accuracy: 70.67 at iteration 19

Probably I'm missing something big, as usual. After trying many things I decide to better ask for advice on this.

On the 2.0.18 the models are getting lower f-score on the evaluation set, but are much higher on the train set and the losses are also much lower. After visually examined the predictions also they are behaving better on the previous version.

Other main reasons why we have been trying on this it's because after prodigy>=1.8.0 it requires spacy>=2.1, we have been working on 1.7.1 and we would like to migrated to the new versions.

alejandrojcastaneira commented 5 years ago

@honnibal Sorry for asking again, I would like to migrate to the 2.1 version but in the 2.0.18 for my use case the models keep showing better results, it would be a way to emulate the hyperparameters from the previous architecture on the 2.1, please forgive my insistence on this. Best regards

honnibal commented 5 years ago

@alejandrojcastaneira Could you try again with v2.2? It should be easier to train as well, now that there's the training CLI.

alejandrojcastaneira commented 5 years ago

Hello, It shows better results, but unfortunately similar behavior, the f-1 score in the evaluation sets is better, but in the training set the results seems to be underfitting (<90 f-1 after 2.1), maybe is specific for my use case, working over long texts and entities which may contain multi-tokens phrases, could it be related to the loss function?

no-response[bot] commented 5 years ago

This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.