ZhangTP1996 / TapTap

58 stars 4 forks source link

Bug: TypeError: '<' not supported between instances of 'list' and 'int' #8

Closed Doris404 closed 5 months ago

Doris404 commented 6 months ago

I am running the example.py, but meet error: TypeError: '<' not supported between instances of 'list' and 'int'. The details are as follows:

The score training by the original data is 0.8242972746040172
  0%|                                                                                                                                        | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/ruc/xiaotong/LLMDataGen/model/TapTap/example.py", line 47, in <module>
    model.fit(train_data, target_col=target_col, task=task) # TypeError: '<' not supported between instances of 'list' and 'int'
  File "/home/ruc/xiaotong/LLMDataGen/model/TapTap/taptap/taptap.py", line 167, in fit
    great_trainer.train(resume_from_checkpoint=resume_from_checkpoint) # TypeError: '<' not supported between instances of 'list' and 'int'
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/transformers/trainer.py", line 1624, in train
    return inner_training_loop( # TypeError: '<' not supported between instances of 'list' and 'int'
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/transformers/trainer.py", line 1928, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator): # TypeError: '<' not supported between instances of 'list' and 'int'
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = self.dataset.__getitems__(possibly_batched_index)
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2814, in __getitems__
    batch = self.__getitem__(keys)
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2810, in __getitem__
    return self._getitem(key)
  File "/home/ruc/xiaotong/LLMDataGen/model/TapTap/taptap/taptap_dataset.py", line 51, in _getitem
    row = self._data.fast_slice(key, 1)
  File "/home/ruc/.conda/envs/llmdatagen/lib/python3.9/site-packages/datasets/table.py", line 138, in fast_slice
    if offset < 0: # TypeError: '<' not supported between instances of 'list' and 'int'
TypeError: '<' not supported between instances of 'list' and 'int'
  0%|                                                                                                                                        | 0/1000 [00:00<?, ?it/s]

I debug it and find out that offset is a list: [14968, 12277, 14741, 9199, 15070, 8155, 14868, 1908, 9803, 12628, 384, 4868, 14502, 7128, 167, 8621, 9047, 1064, 3302, 2347, 6806, 1360, 6211, 14100, 11536, 5531, 2645, 8485, 6999, 6779, 14466, 9604].

My machine can not reach hugging face, so I download the model and data with another machine, and try to run python example.py offset. As for dataset, I download from website and save as taptap_dataset locally. As for model, I save the folder from website as taptap-distill locally.

ZhangTP1996 commented 6 months ago

As far as I could remember, this issue stems from outdated versions of specific python packages (but I cannot recall which one and how to fix it).

Doris404 commented 6 months ago

This project involves many packages. Could you create a requirements.txt file to help me check the versions of each package?

The command is pip freeze > requirements.txt. Lots of thanks ❤️

Doris404 commented 6 months ago

I update datasets package, it works.