I was able to create a sqlite db earlier, as described in the retriever Readme as a test, but after deleting the initial test db, I tried again a several hours later and got an UnicodeDecodeError: utf-8 codec can't decode byte 0xf9 in position 98: invalid start byte.
The text document has the same format as in the Readme: {"id": "doc1", "text": "text of doc1"}
Is there something I missed?
Full error message:
$ python build_db.py /home/HT/DrQA/data/test/ /home/HT/DrQA/data/test/test.db
10/02/2017 10:52:12 AM: [ Reading into database... ]
0%| | 0/2 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/HT/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "build_db.py", line 72, in get_contents
for line in f:
File "/home/HT/anaconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 98: invalid start byte
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "build_db.py", line 136, in
args.data_path, args.save_path, args.preprocess, args.num_workers
File "build_db.py", line 109, in store_contents
for pairs in tqdm(workers.imap_unordered(get_contents, files)):
File "/home/HT/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 872, in iter
for obj in iterable:
File "/home/HT/anaconda3/lib/python3.6/multiprocessing/pool.py", line 699, in next
raise value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 98: invalid start byte
I was able to create a sqlite db earlier, as described in the retriever Readme as a test, but after deleting the initial test db, I tried again a several hours later and got an UnicodeDecodeError: utf-8 codec can't decode byte 0xf9 in position 98: invalid start byte.
The text document has the same format as in the Readme: {"id": "doc1", "text": "text of doc1"}
Is there something I missed?
Full error message:
$ python build_db.py /home/HT/DrQA/data/test/ /home/HT/DrQA/data/test/test.db 10/02/2017 10:52:12 AM: [ Reading into database... ] 0%| | 0/2 [00:00<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/HT/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "build_db.py", line 72, in get_contents for line in f: File "/home/HT/anaconda3/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 98: invalid start byte """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "build_db.py", line 136, in args.data_path, args.save_path, args.preprocess, args.num_workers File "build_db.py", line 109, in store_contents for pairs in tqdm(workers.imap_unordered(get_contents, files)): File "/home/HT/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 872, in iter for obj in iterable: File "/home/HT/anaconda3/lib/python3.6/multiprocessing/pool.py", line 699, in next raise value UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 98: invalid start byte