Closed joshdevins closed 3 years ago
Hi @joshdevins,
Nice question! I, unfortunately, don't know how to implement this. I searched online but couldn't find a direct solution to generate using multi-GPUs. I will keep searching if I find something relevant will get back to you.
If you have suggestions on implementation, I would happy to have a look at them!
Unfortunately for now I can only suggest hosting the individual question-generation models separately in the GPUs.
Kind Regards, Nandan
Thanks @NThakur20. The obvious place for me would be to add data parallelization. So the corpus is chunked right now (I think into size 80 examples?), and we could easily parallelize that loop with a multiprocessing Pool of size n
(one per GPU). When we call generate
we can also pass a GPU/core number and CUDA can use that GPU as specified. I can take a stab at that one script for doc2query and any downstream changes, and maybe it's applicable to processing other datasets at scale as well.
So here we can pass a GPU number and make n
models, one per GPU
https://github.com/UKPLab/beir/blob/8d8061934e2c33374c075412da2093a5f403521a/beir/generation/models/auto_model.py#L6
Last thought, I'm not familiar enough with Transformers+PyTorch innards but there should be a native way to do data parallel inference since we provide batches already. For example https://github.com/huggingface/transformers/issues/3936
Thanks for the pointers @joshdevins, I will have a look at them!
Kind Regards, Nandan
Hi @joshdevins,
Finally, after having no success with DataPallalel, I'm happy to share that I have added support now using multiprocessing to generate multiple process pools which are able to take advantage of multiple GPUs. Hope it helps!
Here is a sample code on how to use it: https://github.com/UKPLab/beir/blob/development/examples/generation/query_gen_multi_gpu.py
I hope now you are able to utilize multiple GPUs now for a faster generation. The updates are yet in the development branch. I would suggest if it's urgent, you can clone the development branch and start working. I would soon push the changes to master and update the PyPI version.
Kind Regards, Nandan
I'm running evaluate_anserini_docT5query.py and it's currently only able to utilize a single GPU. I'm using GCP instances with 4x V100 GPUs and I'd like to finish more experiments in less time by using more GPUs. Is there a simple way to parallelize batches or configure using multiple GPUs that I'm not aware of?