DLLXW / baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
MIT License
2.47k stars 305 forks source link

一个很诡异的错误 IndexError: index 35930 is out of bounds for axis 1 with size 2048 #42

Closed zhaodice closed 11 months ago

zhaodice commented 11 months ago
(_) user@calculator:~/Player/baby-llama2-chinese$ python3 __pretrain.py 
tokens per iteration will be: 2,048
breaks down as: 1 grad accum steps * 1 processes * 1 batch size * 2048 max seq len
memmap:True train data.shape:(702015, 2048)
downloading finished.....
Initializing a new model from scratch
num decayed parameter tensors: 85, with 2,746,744,832 parameters
num non-decayed parameter tensors: 25, with 102,400 parameters
using fused AdamW: True
Traceback (most recent call last):
  File "/home/user/Player/baby-llama2-chinese/__pretrain.py", line 317, in <module>
    train_epoch(epoch)
  File "/home/user/Player/baby-llama2-chinese/__pretrain.py", line 51, in train_epoch
    for step, (X, Y) in enumerate(train_loader):
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/user/Player/baby-llama2-chinese/dataset.py", line 36, in __getitem__
    sample = self.data[index]
  File "/home/user/anaconda3/envs/_/lib/python3.10/site-packages/numpy/core/memmap.py", line 334, in __getitem__
    res = super().__getitem__(index)
IndexError: index 35930 is out of bounds for axis 1 with size 2048

不知道这35930是怎么来的,2048是max_seq_len

zhaodice commented 11 months ago

好吧我改错代码了