slow on gpu / parallel mode

AMontgomerie / question_generator

An NLP system for generating reading comprehension questions

MIT License

273 stars 72 forks source link

slow on gpu / parallel mode #3

Open ghost opened 3 years ago

ghost commented 3 years ago

Hi,

Hope you are all well !

I forked your code and created a flask server for generating question from webpages I scrape. (And, of course, I convert the html into clean text ^^)

It takes a long time (120s in avg) to generate questions (only sentences) even if cuda is available.

Is there a way to optimise the processing time ? I have 3 x gpu on my server, is it possible to enable the parallel or distributed mode for question_generator ?

Cheers, X

AMontgomerie commented 3 years ago

Hey!

That does sound like quite a long time! Currently question generator doesn't support multiple GPUs but I suppose it should be possible using torch.distributed.

To be honest I don't really know much about it, and these tutorials seem to be mostly about distributed training rather than inference, but it might help. I don't currently have access to an environment with multiple GPUs to do any testing though.

AMontgomerie commented 3 years ago

Another possibility for speeding up inference would be exporting the model to ONNX.