Closed pratapaprasanna closed 5 years ago
Not sure this is what's wrong, but looking at your code this stood out to me:
batches = minibatch(train_data, size=2048)
Is that batch size intended? It's very large.
I'm having the similar behavior of the losses not decreasing, in our case instead of CPU-GPU difference but between spacy versions:
2.0.18: _Iteration 1 eval_f: 63.79 train_f: 65.34 loss: 15934.35 . . Iteration 30 eval_f: 70.02 trainf: 93.26 loss: 3748.60 best accuracy: 70.67 at iteration 19
2.1.4: _Iteration 1 eval_f: 58.36 train_f: 58.98 loss: 181387.64 . . Iteration 30 eval_f: 71.12 trainf: 80.04 loss: 69142.48 best accuracy: 71.12 at iteration 30 eval_f: f1 score in evaluation set trainf_f: f1 score in train set batch_size = (2-8) dropout = 0.2
The f1 score is higher in the 2.1.4 int the eval set than in 2.0.18 version, although the losses are higher and also the score on the train set, we are afraid if it could be under fitting.
kindly will like to know which are the main architecture differences between the models on both versions.
Best Regards
Donno why but in my latest run it dropped for sometime and then it was stuck
iter:467 loss: {'ner': 93100.5539039111}
^[^[^[iter:468 loss: {'ner': 93206.59834370334}
for the last 100-150 iterations ..
Sorry for being reiterative on this, but training on 2.0.18 the models keep showing better results that on 2.1.x, after trying multiple experiments:
Default Model
Iteration 1 - eval_f: 58.93 train_f: 58.56 losses: 168191.75 Iteration 30 - eval_f: 70.48 train_f: 79.51 losses: 70717.94
best accuracy: 70.48 at iteration 30
Model + WordVectors
iteration 1 eval_f: 58.36 train_f: 58.98 losses: 181387.64 iteration 30 eval_f: 71.12 train_f: 80.046 losses: 69142.48
best accuracy: 71.12 at iteration 30 losses: 69142.48
Model + Pretrained without vectors:
iteration 1 eval_f: 62.34 train_f: 63.07losses: 152334.11 iteration 30 eval_f: 71.03 train_f: 79.14 losses: 70682.21
best accuracy: 71.03 at iteration 30 losses: 70682.21
Model + Pretrained with vectors:
iteration 1 eval_f: 63.52 train_f: 64.51 losses: 156741.01 iteration 30 eval_f: 71.11 train_f: 80.21 losses: 67391.69
best accuracy: 71.43 at iteration 24
Model + WordVectors + BILSTM = 2
iteration 1 eval_f: 20.46 train_f: 20.67 losses: 218547.50 Iteration 30 eval_f: 70.44 train_f: 75.43 losses: 58840.90
best accuracy: eval_f: 70.44 at iteration 30
Model + Wordvectors + CNN Width 256 + CNN Depth 7 + Embedrows 7500
iteration 1 eval_f: 60.76 train_f: 61.71 losses: 175188.69 iteration 30 eval_f: 71.45 train_f: 87.96 losses: 53303.67
best accuracy: 71.54 at iteration 27
Although on 2.0.18:
Default Model
iteration 1 eval_f: 63.27 train_f: 65.29 losses: 16354.67 iteration 30 eval_f: 70.25 train_f: 93.36 losses: 3825.24
best accuracy: 70.51 at iteration 17
Model + WordVectors
iteration 1 eval_f: 63.79 train_f: 65.34 losses: 15934.35 iteration 30 eval_f: 70.02 train_f: 93.26 losses: 3748.6055
best accuracy: 70.67 at iteration 19
Probably I'm missing something big, as usual. After trying many things I decide to better ask for advice on this.
On the 2.0.18 the models are getting lower f-score on the evaluation set, but are much higher on the train set and the losses are also much lower. After visually examined the predictions also they are behaving better on the previous version.
Other main reasons why we have been trying on this it's because after prodigy>=1.8.0 it requires spacy>=2.1, we have been working on 1.7.1 and we would like to migrated to the new versions.
@honnibal Sorry for asking again, I would like to migrate to the 2.1 version but in the 2.0.18 for my use case the models keep showing better results, it would be a way to emulate the hyperparameters from the previous architecture on the 2.1, please forgive my insistence on this. Best regards
@alejandrojcastaneira Could you try again with v2.2? It should be easier to train as well, now that there's the training CLI.
Hello, It shows better results, but unfortunately similar behavior, the f-1 score in the evaluation sets is better, but in the training set the results seems to be underfitting (<90 f-1 after 2.1), maybe is specific for my use case, working over long texts and entities which may contain multi-tokens phrases, could it be related to the loss function?
This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi all, I have trained a NER model with CPU and i could see that the NER loss decreasing but when i try to use the same code but add add
spacy.require_gpu()
and try using my GPU the NER loss was stagnate after few epochsBut my loss was decreasing when i ran my training on CPU
Did i miss anything and is there any approach to speed-up my training
Any help would be off great use
Thanks in advance.
Your Environment