BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.47k stars 847 forks source link

AttributeError: 'MyDataset' object has no attribute 'global_rank' #200

Closed cahya-wirawan closed 10 months ago

cahya-wirawan commented 10 months ago

Hi,

When I run the train.py script of RWKV-v5, I get error message AttributeError: 'MyDataset' object has no attribute 'global_rank' as follow:

AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/cahya/miniconda3/envs/rwkv/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/cahya/Work/RWKV-LM/RWKV-v5/src/dataset.py", line 104, in __getitem__
    rank = self.global_rank
AttributeError: 'MyDataset' object has no attribute 'global_rank'

my torch version is: 2.1.0.dev20230629

cahya-wirawan commented 10 months ago

The problem was caused by the pytorch lightning version 2.1.1. After I downgraded to 1.9.5, the problem was solved.

DarokCx commented 10 months ago

followed those instructions but now: argument of type 'NoneType' is not iterable happens

BlinkDL commented 10 months ago

@DarokCx IMPORTANT: Use deepspeed==0.7.0 pytorch-lightning==1.9.5 torch 1.13.1+cu117