Closed awonderfullife closed 6 years ago
It does not! It's just that my batch generator in this particular project is nothing short of terrible and it pretty much always have some calculations to do.
I'm working on a different batch generator that should be available in the next few days.
You can test my claims by running watch -n 1 nvidia-smi
on your Linux box and you should see a usage peak every 2-3 seconds depending on batch size.
Please also do keep in mind that the model in the original paper took 50 hours to train so it's normal if it seems to be a slow learner. With my GTX 1080Ti it took about 36 hours to reach convergence.
My gpu usage rate is 0,But the memory usage is up to 90%, and it takes 5 minutes for a step @Belval
There is something wrong with the whole training pipeline I think. Unfortunately, I have been quite short on time recently so I don't know when I'll have the time to work on this.
Do consider what I wrote above though: the model does take time to train and you will see small peaks of GPU activity for 1-2 seconds of 100%. I think it's just data-starved.
Yes, it have small peaks of GPU activity for 1-2 seconds of 100%, but after that, it also have 5mins 0%. @Belval
oh, I find CTCBeamSearchDecoder 200s+ cost, it use cpu.
Hmm this makes sense but I had not thought about it.
I know this implementation is often considered better than the TF one: https://github.com/baidu-research/warp-ctc/tree/master/tensorflow_binding
Also we could check if https://www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder is faster since the docs seem to imply that.
it seems that it only uses cpu to train the model...