Open ManasiPat opened 3 years ago
@eisenjulian @ghost In our dataset we don't have the intermediate labels in terms of answer_coordinates and can not calculate them using the parsing utility provided as the queries are aggregation queries and none of the cells would match the answer. We are using pytorch Hugginface TAPAS. If we try to pass answer_coordinates as None the code throws an error. Our question is what to be passed as answer_coordinates in such a scenario? When we tried to make the labels of cell selection as all zeros (as we don't know them) the model does not get trained. In short how to trained the model in the cases of weak supervision only. Please answer this as we are trying to figure this out for past month.
I am using the pytorch hugginface model (https://huggingface.co/transformers/model_doc/tapas.html) for tablequestion answering task. My data has only the final answer for supervision and no answer co-ordinates or aggregation labels. I have created data in the SQA format tsv file where I feed the values of answer_text, answer_coordinates and aggretation_labels columns to be None and the value of float_answer column to be my answer. I am getting the following error from TapasTokenizer:
Traceback (most recent call last): File "tapas_pytorch/test.py", line 76, in
for idx, batch in enumerate(train_dataloader):
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "tapas_pytorch/test.py", line 33, in getitem
encoding = self.tokenizer(table=table,
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 617, in call
return self.encode_plus(
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 966, in encode_plus
return self._encode_plus(
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1020, in _encode_plus
return self.prepare_for_model(
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1177, in prepare_for_model
labels = self.get_answer_ids(column_ids, row_ids, table_data, answer_text, answer_coordinates)
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1754, in get_answer_ids
return self._get_answer_ids(column_ids, row_ids, answer_coordinates_question)
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1740, in _get_answer_ids
answer_ids, missing_count = self._get_all_answer_ids(column_ids, row_ids, answer_coordinates)
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1666, in _get_all_answer_ids
column_ids, row_ids, answers_list=(_to_coordinates(answer_coordinates))
File "/mnt/nfs/deep-learning-and-ai/deep-learning-under-data-sparsity/Manasi/tapas_pytorch/lib/python3.8/site-packages/transformers/models/tapas/tokenization_tapas.py", line 1663, in _to_coordinates
return [(coords[1], coords[0]) for coords in answer_coordinates_question]
TypeError: 'numpy.float64' object is not iterable
Can answer_coordinates be not be None in the SQL format? Especially when I am doing weak supervision?