NielsRogge / tapas_utils

A package containing utils for the PyTorch version of the Tapas algorithm.
10 stars 1 forks source link

WTQ TableQuestionAnswering Task SQA Format: Values of 'float_answer', 'answer_text' and 'answer_coordinates' columns in weakly supervise setting #5

Open ManasiPat opened 3 years ago

ManasiPat commented 3 years ago

I have table question answering task (WTQ), where I have float answers (which do not match with any of the answers in the table) and require aggregation operation on table data whose ground truth is not provided. Thus, I am using weakly supervise setting. I am using TAPAS pytorch implementation which is a part of TAPAS hugginface library (https://huggingface.co/transformers/model_doc/tapas.html).

When am preparing data in SQA format I am setting 'float_answer' column to my answer in float and 'answer_text' and 'answer_coordinates' columns to None. However, the code gives me the following error:

Traceback (most recent call last): File "tapas_pytorch/test.py", line 76, in for idx, batch in enumerate(train_dataloader): File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "tapas_pytorch/test.py", line 33, in getitem encoding = self.tokenizer(table=table, File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 617, in call return self.encode_plus( File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 966, in encode_plus return self._encode_plus( File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1020, in _encode_plus return self.prepare_for_model( File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1177, in prepare_for_model labels = self.get_answer_ids(column_ids, row_ids, table_data, answer_text, answer_coordinates) File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1754, in get_answer_ids return self._get_answer_ids(column_ids, row_ids, answer_coordinates_question) File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1740, in _get_answer_ids answer_ids, missing_count = self._get_all_answer_ids(column_ids, row_ids, answer_coordinates) File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1666, in _get_all_answer_ids column_ids, row_ids, answers_list=(_to_coordinates(answer_coordinates)) File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1663, in _to_coordinates return [(coords[1], coords[0]) for coords in answer_coordinates_question] TypeError: 'numpy.float64' object is not iterable

The question is in weakly supervise setting what should be the values of 'answer_text' and 'answer_coordinates' columns?