RUCAIBox / LLMBox

A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
MIT License
566 stars 74 forks source link

Error in Batch Sampler: IndexError: list index out of range #267

Closed krzz2q closed 1 month ago

krzz2q commented 2 months ago

Description:

I encountered an IndexError while running the inference script for the Meta-Llama-3-8B-Instruct model using the command below:

CUDA_VISIBLE_DEVICES=1 python inference.py \
  -m /home/user/huggingface/pretrained_model/meta-llama/Meta-Llama-3-8B-Instruct/ --temperature 0 \
  -d mmlu -shots 5 --max_example_tokens 4096 \
  --model_type chat --load_in_8bit

The error traceback is as follows:

2024-07-02 23:18:18 WARNING Error occurred during evaluation. You can continue evaluation by loading the checkpoint: --continue_from evaluation_results/Meta-Llama-3-8B-Instruct-mmlu-5shot-2024_07_02-23_16_19.jsons
Traceback (most recent call last):
  File "/home/user/LLMBox/inference.py", line 18, in <module>
    main()
  File "/home/user/LLMBox/inference.py", line 14, in main
    evaluator.evaluate()
  File "/home/user/LLMBox/utilization/utils/catch_error.py", line 59, in wrapper
    raise e
  File "/home/user/LLMBox/utilization/utils/catch_error.py", line 40, in wrapper
    return func(*args, **kwargs)
  File "/home/user/LLMBox/utilization/evaluator.py", line 110, in evaluate
    for batch in dataloader:
  File "/home/user/LLMBox/utilization/utils/dynamic_stride_tqdm.py", line 68, in __iter__
    for obj in iterable:
  File "/home/user/anaconda3/envs/LLMBox/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/home/user/anaconda3/envs/LLMBox/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/home/user/anaconda3/envs/LLMBox/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 621, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/user/LLMBox/utilization/model/model_utils/batch_sampler.py", line 183, in __iter__
    yield from AutoBatchSizeSampler(
  File "/home/user/LLMBox/utilization/model/model_utils/batch_sampler.py", line 65, in __init__
    if self.check_new_batch(self.data_order[-1], i + 1):
  File "/home/user/LLMBox/utilization/model/model_utils/batch_sampler.py", line 77, in check_new_batch
    max_len = max(len(self.data[q]) for q in queries)
  File "/home/user/LLMBox/utilization/model/model_utils/batch_sampler.py", line 77, in <genexpr>
    max_len = max(len(self.data[q]) for q in queries)
IndexError: list index out of range

Steps to Reproduce:

  1. Run the command as provided above.
  2. The error occurs during the evaluation phase, specifically within the evaluate() function.

Expected Behavior:

The evaluation should complete without errors.

Environment:

Additional Information:

It seems the error is related to the batch_sampler.py file, where the code attempts to access an index that is out of range. Any help or guidance on how to resolve this issue would be greatly appreciated. If additional logs or information is needed, please let me know.

Thank you!

huyiwen commented 2 months ago

Thank you for your feedback. I have also reproduced the issue and will fix it shortly

huyiwen commented 2 months ago

You can add a --prefix_caching True flag to temperately solve the problem while waiting for us to fix it

krzz2q commented 2 months ago

You can add a --prefix_caching True flag to temperately solve the problem while waiting for us to fix it OK. Thank you.

xansar commented 1 month ago

+1 encountered the same problem, but I used Generation mode, so --prefix_caching True is not valid for me