Open Haran71 opened 1 week ago
Hey @Haran71
The documentation states:
Gather tensors or collections of tensors from multiple processes.
This method needs to be called on all processes and the tensors need to have the same shape across all
processes, otherwise your program will stall forever.
Args:
data: int, float, tensor of shape (batch, ...), or a (possibly nested) collection thereof.
group: the process group to gather results from. Defaults to all processes (world).
sync_grads: flag that allows users to synchronize gradients for the ``all_gather`` operation
Return:
A tensor of shape (world_size, batch, ...), or if the input was a collection
the output will also be a collection with tensors of this shape. For the special case where
world_size is 1, no additional dimension is added to the tensor(s).
It does not mention anywhere that strings are supported. The documentation states clearly this is meant to work for tensors. The reason why there is no error is because we want to support dictionaries. Perhaps the documentation could mention that explicitly.
If you have predictions you'd like to all-gather, I suggest to keep them as numbers/tensors, gather them, and then convert them to strings at the end.
Bug description
I have a list of strings, on each device in multi-gpu evaluation, I want to be able to collect them all on all devices across all devices into a single list
when I try the above code (
all_preds
I andall_gt
are lists of strings),m_preds
andm_gt
are the same lists asall_preds
andall_gt
as per the device their on. Am I doing something wrong?What version are you seeing the problem on?
v2.2
How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response
cc @borda