ZhangTP1996 / TapTap

60 stars 4 forks source link

TypeError: '<' not supported between instances of 'list' and 'int' #5

Closed mohanvrk closed 1 year ago

mohanvrk commented 1 year ago

Hi @ZhangTP1996,

Congratulations on your fantastic work.

When I tried to implement, I got the following error. Could you please help me in this regard.

Thanks in advance

File ~/anaconda3/envs/test/lib/python3.8/site-packages/datasets/arrow_dataset.py:2658, in Dataset.getitem(self, key) 2656 def getitem(self, key): # noqa: F811 2657 """Can be used to index columns (by string names) or rows (by integer index or iterable of indices or bools).""" -> 2658 return self._getitem(key)

File ~/work/TapTap/taptap/taptap_dataset.py:51, in TaptapDataset._getitem(self, key, decoded, **kwargs) 48 shuffled_text = "" 49 # for k in [key, np.random.randint(0, len(self._data))]: ---> 51 row = self._data.fast_slice(key, 1) 52 if self.shuffled_idx is None: 53 shuffle_idx = list(range(row.num_columns-1))

File ~/anaconda3/envs/test/lib/python3.8/site-packages/datasets/table.py:135, in IndexedTableMixin.fast_slice(self, offset, length) 127 def fast_slice(self, offset=0, length=None) -> pa.Table: 128 """ 129 Slice the Table using interpolation search. 130 The behavior is the same as pyarrow.Table.slice but it's significantly faster. (...) 133 The batches to keep are then concatenated to form the sliced Table. 134 """ --> 135 if offset < 0: 136 raise IndexError("Offset must be non-negative") 137 elif offset >= self._offsets[-1] or (length is not None and length <= 0):

TypeError: '<' not supported between instances of 'list' and 'int'

ZhangTP1996 commented 1 year ago

Could you please provide code example and dataset to reproduce the bug? It is hard to diagnose the reasons with only the error information.

mohanvrk commented 1 year ago

Hi @ZhangTP1996 ,

Used the example code provided. No changes are made to that.

mohanvrk commented 1 year ago

Here is the solution.

https://github.com/kathrinse/be_great/issues/12#issue-1611170715

This is due to the update of the huggingsface package datasets. To avoid this error please do:

pip install datasets==2.5.2