Closed Otax-kaz closed 3 years ago
Hi,
I'm trying to reproduce the error but I couldn't. Can you check your input files (SQA data) if they are correctly loaded? Also can you check which file/files outputs this error: can you try putting one by one in the input directory?
Can you find the character that cannot be read: It happened to me when downloading (using different machine type) some public datasets (different from SQA) that the escape character is coded in a wrong way. I suspect that you have one character that is badly encoded. If you know what it is you can change it to the correct one.
Thanks, Syrine
On Wed, Nov 10, 2021 at 2:25 PM Kazunari Ota @.***> wrote:
I was studying table question answering and was interested in your research. Therefore, I tried to reproduce the experiment.
I'm using an Ubuntu container with Docker. The package installation and tox execution are complete. I downloaded the SQA dataset and the model tapas_sqa_inter_masklm_tiny_reset and tried fine tuning, but an error occurred.
Looking at the results of other people's executions on the Internet, I don't think there is any particular difference. Also, the dataset's tsv file doesn't seem to contain any weird characters.
This is the command I executed.↓
python3 tapas/run_task_main.py \
--task="SQA" \
--input_dir="data/SQA_Release_1" \
--output_dir="output_dir" \
--bert_vocab_file="tapas_sqa_inter_masklm_tiny_reset/vocab.txt" \
--mode="create_data"
This is the error that occurred.↓
Instructions for updating:
non-resource variables are not supported in the long term
Creating interactions ...
I1110 12:53:56.770750 140229047478016 run_task_main.py:192] Creating interactions ...
Traceback (most recent call last):
File "tapas/run_task_main.py", line 908, in
app.run(main)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tapas/run_task_main.py", line 861, in main
task_utils.create_interactions(task, FLAGS.input_dir, output_dir,
File "/workspace/tapas/utils/task_utils.py", line 171, in create_interactions
sqa_utils.create_interactions(
File "/workspace/tapas/utils/sqa_utils.py", line 182, in create_interactions
interaction_dict = _read_interactions(input_dir)
File "/workspace/tapas/utils/sqa_utils.py", line 46, in _read_interactions
interactions = interaction_utils.read_from_tsv_file(file_handle)
File "/workspace/tapas/utils/interaction_utils.py", line 86, in read_from_tsv_file
for row in csv.DictReader(file_handle, delimiter='\t'):
File "/opt/conda/lib/python3.8/csv.py", line 110, in next
self.fieldnames
File "/opt/conda/lib/python3.8/csv.py", line 97, in fieldnames
self._fieldnames = next(self.reader)
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 211, in next
return self.next()
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 205, in next
retval = self.readline()
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 170, in readline
return self._prepare_value(self._read_buf.readline())
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 93, in _prepare_value
return compat.as_str_any(val)
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/util/compat.py", line 139, in as_str_any
return as_str(value)
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/util/compat.py", line 118, in as_str
return as_text(bytes_or_text, encoding)
File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/util/compat.py", line 109, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
I'm sorry it's hard to understand. I would appreciate it if you could answer.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google-research/tapas/issues/146, or unsubscribe https://github.com/notifications/unsubscribe-auth/APARZOK7XP5R6MY2MK5KJV3ULJXDJANCNFSM5HX4RT3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Thanks for your answer.
When I was checking the processing for the input file, I found that this program was reading the invisible file. The invisible file is "._{original filename}" and the encoding seems to be "Windows-1252". When I ran the python program after removing this file from "input_dir", the program worked without error.
I wasn't paying attention to the fact that I downloaded SQA file on a MacOS machine. I'm sorry I was confused by this lack of confirmation.
I appreciate it very much!!
I was studying table question answering and was interested in your research. Therefore, I tried to reproduce the experiment.
I'm using an Ubuntu container with Docker. The package installation and tox execution are complete. I downloaded the SQA dataset and the model tapas_sqa_inter_masklm_tiny_reset and tried fine tuning, but an error occurred.
Looking at the results of other people's executions on the Internet, I don't think there is any particular difference. Also, the dataset's tsv file doesn't seem to contain any weird characters.
This is the command I executed.↓
This is the error that occurred.↓
I'm sorry it's hard to understand. I would appreciate it if you could answer.