-
**Describe the bug**
tokenizer map in `hf_decoder_model` use multi `preprocessing_num_workers` will return `TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object`
**To Reprodu…
-
error log:
Generating train split: 3457 examples [00:00, 14292.20 examples/s]
Map (num_proc=32): 0%| | 0/3457 [00:00
-
As the title says, I wonder if this is normal.
If not, how should I optimize it?
Logs
```
using world size: 16, data-parallel size: 4, context-parallel size: 1 tensor-model-parallel size: 4…
-
As mentioned in https://github.com/dask/distributed/issues/7639#issuecomment-1489013077 , we are seeing what we think is a bug due to the removal of the [finalizer for a ThreadPoolExecutor](https://g…
-
Here is the error:
File "/home/server/Python_Project/django/yolo/YOLO-World/yolo_world/datasets/utils.py", line 28, in yolow_collate
masks = datasamples.gt_instances.masks.to_tensor(
AttributeE…
-
### Describe the bug
I am using PyTorch DDP (Distributed Data Parallel) to train my model. Since the data is too large to load into memory at once, I am using load_dataset to read the data as an it…
-
# Task Name
Audio Tagging on Multiple Datasets
## Task Objective
This task is a variation of "Audio Tagging on AudioSet" before. For details of the original task, please refer to https://gith…
-
### System Info
```Shell
!pip install transformers==4.44.0
!pip install accelerate==0.33.0
!pip install datasets==2.21.0
!pip install evaluate==0.4.2
!pip install scipy scikit-learn
run on a H…
-
Hi,
I am trying to run inference with `llama2+13b` and I have 4 RTX3090 each with 24GB Memory, however I noticed that when I use the sample inference code, it only uses one GPU which causes out of …
-
**Describe the bug**
When I am using the most recent Megatrone-LM fork I get the following error
```
make: Entering directory '/workspace/megatron-lm/megatron/core/datasets'
g++ -O3 -Wall -sha…