Closed hermanseu closed 4 years ago
one question more:
In readme, the batch_size is 160 as suggested. I have two gpu cards with 8G memory, when batch_size=16, the half of gpu memory will be used, when batch_size=32, the gpu will be out of memory. My gpu memory is too small?
can you do a learning rate warmup, for example, use lr=1e-4 at the beginning, and the raise back to 1e-3.
yes, i have learning rate warmup. In one epoch, the acc is decreased continually. command : python3 main.py --net resnet34s --batch_size 8 --gpu 1 --lr 0.001 --warmup_ratio 0.1 --optimizer adam --epochs 64 --multiprocess 8 --loss softmax
I try to adjust the machine environment with Python 2.7 Keras 2.2.4 Tensorflow 1.8.0, the output is same with python3.
Hm, not sure, it might be because of some library updates, but when I release the code, it definitely worked, check some solved issues, for example,
https://github.com/WeidiXie/VGG-Speaker-Recognition/issues/10#issue-420380352
Anyway, I would then debug from two perspectives:
I have read all the issues, but i have not found helpful info about my question.
When trying to use avg aggregation_mode, I get a assertion about categorical_crossentropy. The target label dim and predict label dim are not match. Because the output dim of the reshape operation is 3 not 2. It's maybe a bug, or my version is not match. And I modify the output dim as 2, the acc is still decreasing.
pick a small subset with 50 speakers(1000 utterances), use ghostvlad, the acc is normal, 0.994 after 64 epochs, I guess it overfited. I have 15899 speakers, but only 20 utterances per speaker. Maybe the number of utterance of a speaker is too small to train the model?
OK, cool, so that means there is nothing wrong with the code, the rest is about your training schedule, maybe do curriculum, start from small number of speakers, and gradually adding more, this is beyond my responsibility, so I'll close this issue now.
HI WeidiXie, thanks for your paper and code.
I have 15899 speakers and 317980 utterances(20 utterances per speaker). When trying to use the data to train a model, i get a decreasing acc not increasing. The batch_size is 8, other params are default. It's no problem about the data that I have checked. There must be something wrong happened. After 30 epochs, the loss and acc are almost same as before. Can you give me some advice to solve the problem.