How can I accelerate training process using gpu？

Belval / CRNN

A TensorFlow implementation of https://github.com/bgshih/crnn

MIT License

299 stars 101 forks source link

How can I accelerate training process using gpu？ #7

Closed awonderfullife closed 6 years ago

awonderfullife commented 6 years ago

it seems that it only uses cpu to train the model...

Belval commented 6 years ago

It does not! It's just that my batch generator in this particular project is nothing short of terrible and it pretty much always have some calculations to do.

I'm working on a different batch generator that should be available in the next few days.

You can test my claims by running watch -n 1 nvidia-smi on your Linux box and you should see a usage peak every 2-3 seconds depending on batch size.

Please also do keep in mind that the model in the original paper took 50 hours to train so it's normal if it seems to be a slow learner. With my GTX 1080Ti it took about 36 hours to reach convergence.

TccccD commented 5 years ago

My gpu usage rate is 0，But the memory usage is up to 90%， and it takes 5 minutes for a step @Belval

Belval commented 5 years ago

There is something wrong with the whole training pipeline I think. Unfortunately, I have been quite short on time recently so I don't know when I'll have the time to work on this.

Do consider what I wrote above though: the model does take time to train and you will see small peaks of GPU activity for 1-2 seconds of 100%. I think it's just data-starved.

TccccD commented 5 years ago

Yes, it have small peaks of GPU activity for 1-2 seconds of 100%, but after that, it also have 5mins 0%. @Belval

TccccD commented 5 years ago

oh, I find CTCBeamSearchDecoder 200s+ cost, it use cpu.

Belval commented 5 years ago

Hmm this makes sense but I had not thought about it.

I know this implementation is often considered better than the TF one: https://github.com/baidu-research/warp-ctc/tree/master/tensorflow_binding

Also we could check if https://www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder is faster since the docs seem to imply that.