bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.33k stars 522 forks source link

KeyError: 'response ' #108

Open camhfwang opened 1 year ago

camhfwang commented 1 year ago

Hi,I am trying to run the fine-tuning code on my computer, but I got KeyError: 'response',the environment is installed according to the README. Traceback (most recent call last): File "/home/starcoder/finetune/finetune.py", line 313, in main(args) File "/home/starcoder/finetune/finetune.py", line 302, in main run_training(args, train_dataset, eval_dataset) File "/home/starcoder/finetune/finetune.py", line 293, in run_training trainer.train() File "/root/miniconda3/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/root/miniconda3/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/root/miniconda3/envs/env/lib/python3.10/site-packages/accelerate/data_loader.py", line 553, in iter next_batch, next_batch_info = self._fetch_batches(main_iterator) File "/root/miniconda3/envs/env/lib/python3.10/site-packages/accelerate/data_loader.py", line 520, in _fetch_batches batches.append(next(iterator)) File "/root/miniconda3/envs/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 628, in next data = self._next_data() File "/root/miniconda3/envs/env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/root/miniconda3/envs/env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch data.append(next(self.dataset_iter)) File "/home/starcoder/finetune/finetune.py", line 173, in iter buffer.append(prepare_sample_text(next(iterator), self.input_column_name, self.output_column_name)) File "/home/starcoder/finetune/finetune.py", line 127, in prepare_sample_text text = f"Question: {example[input_column_name]}\n\nAnswer: {example[output_column_name]}" KeyError: 'response '

ArmelRandy commented 1 year ago

Hi. It seems like the dataset that you use do not have a column 'response'. You should check the name of columns of your dataset and those you want to use for the fine-tuning. And there is a space at the end of 'response ', you should probably remove it and use response.

camhfwang commented 1 year ago

Hi. It seems like the dataset that you use do not have a column 'response'. You should check the name of columns of your dataset and those you want to use for the fine-tuning. And there is a space at the end of 'response ', you should probably remove it and use response.

Yes, I took into account all the situations you mentioned, I tried to print the sample dataset used by the model, as shown below, the dataset has a 'response' key and no space after it. What I did was to reproduce this project, try to run through this fine-tuning code, the environment and datasets were installed according to README, I tried to replace the relevant parameters directly with 'response' in the code, but it didn't work. dataset

ArmelRandy commented 1 year ago

Can you share the command you used in order to run the code?

reckdk commented 1 year ago

Hi. It seems like the dataset that you use do not have a column 'response'. You should check the name of columns of your dataset and those you want to use for the fine-tuning. And there is a space at the end of 'response ', you should probably remove it and use response.

Additional comments on Armel's reply: there is a invisible tailing space after the backslash of "response"\ , which causes parsing issue: --output_column_name="response"\ When running the fine-tuning code, removing that extra space should fix the issue.