Hi! I'm having some trouble using multiple gpus for run_finetune_rag_dialdoc.sh file.
I have set --gpus parameter as 4 but i kept getting errors as below.
ValueError: ProcessGroupGloo::scatter: invalid tensor type at index 0 (expected TensorOptions(dtype=double, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))
So I have modified a line 159 in dialdoc/models/rag/distributed_pytorch_retriever.py file by not specifying target_type variable.
retrieved_doc_embeds = self._scattered(scatter_vectors, [n_queries, n_docs, combined_hidden_states.shape[1]])`
After this modification, i am getting errors as below and I couldn't figure out why I'm getting this error.
File "/home/yunah/multidoc2dial_ours/dialdoc/models/rag/distributed_pytorch_retriever.py", line 157, in retrieve
doc_ids = self._scattered(scatter_ids, [n_queries, n_docs], target_type=torch.int64)
File "/home/yunah/multidoc2dial_ours/dialdoc/models/rag/distributed_pytorch_retriever.py", line 82, in _scattered
dist.scatter(target_tensor, src=0, scatter_list=scatter_list, group=self.process_group)
File "/home/yunah/.conda/envs/multidoc2dial/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 2191, in scatter
work = group.scatter(output_tensors, input_tensors, opts)
ValueError: ProcessGroupGloo::scatter: Incorrect input list size 1. Input list size should be 2, same as size of the process group.
Did I miss any other variables or settings I should change before using multiple gpus?
I would like to know if there is a solution for this error.
Thanks a lot!
Hi! I'm having some trouble using multiple gpus for run_finetune_rag_dialdoc.sh file.
I have set --gpus parameter as 4 but i kept getting errors as below.
So I have modified a line 159 in dialdoc/models/rag/distributed_pytorch_retriever.py file by not specifying target_type variable.
retrieved_doc_embeds
= self._scattered(scatter_vectors, [n_queries, n_docs, combined_hidden_states.shape[1]])`After this modification, i am getting errors as below and I couldn't figure out why I'm getting this error.
Did I miss any other variables or settings I should change before using multiple gpus? I would like to know if there is a solution for this error. Thanks a lot!
Best, Yunah