google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.15k stars 217 forks source link

ValueError: Too many rows #177

Open Gayatri-95 opened 1 year ago

Gayatri-95 commented 1 year ago

Hi,

I am trying to fine tune TAPAS WTQ model on dummy data of 990 rows and 18 columns ('Nobel Laureates, 1901-Present' dataset: https://www.kaggle.com/datasets/nobelfoundation/nobel-laureates). I am running the notebook on Kaggle with max RAM of 30 GB. However, I am encountering issues while encoding using TapasTokenizer. When the parameter truncation=False, the error is - "ValueError: Too many rows" and when the parameter truncation=True, it is - "ValueError: Couldn't find all answers".

The encoding code looks like below: encoding = tokenizer(table=table, queries=item.question, answer_coordinates=item.answer_coordinates, answer_text=item.answer_text, truncation=True, padding="max_length", return_tensors="pt") encoding.keys()

Can anyone let me know what is the maximum size of the data i.e., maximum number of rows and columns which can be handled by the model?