Open ghost opened 3 years ago
Hey!
That does sound like quite a long time! Currently question generator doesn't support multiple GPUs but I suppose it should be possible using torch.distributed
.
To be honest I don't really know much about it, and these tutorials seem to be mostly about distributed training rather than inference, but it might help. I don't currently have access to an environment with multiple GPUs to do any testing though.
Another possibility for speeding up inference would be exporting the model to ONNX.
Hi,
Hope you are all well !
I forked your code and created a flask server for generating question from webpages I scrape. (And, of course, I convert the html into clean text ^^)
It takes a long time (120s in avg) to generate questions (only sentences) even if cuda is available.
Is there a way to optimise the processing time ? I have 3 x gpu on my server, is it possible to enable the parallel or distributed mode for question_generator ?
Cheers, X