facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.48k stars 898 forks source link

UnicodeDecodeError #45

Closed hattek closed 7 years ago

hattek commented 7 years ago

I was able to create a sqlite db earlier, as described in the retriever Readme as a test, but after deleting the initial test db, I tried again a several hours later and got an UnicodeDecodeError: utf-8 codec can't decode byte 0xf9 in position 98: invalid start byte.

The text document has the same format as in the Readme: {"id": "doc1", "text": "text of doc1"}

Is there something I missed?

Full error message:

$ python build_db.py /home/HT/DrQA/data/test/ /home/HT/DrQA/data/test/test.db 10/02/2017 10:52:12 AM: [ Reading into database... ] 0%| | 0/2 [00:00<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/HT/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "build_db.py", line 72, in get_contents for line in f: File "/home/HT/anaconda3/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 98: invalid start byte """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "build_db.py", line 136, in args.data_path, args.save_path, args.preprocess, args.num_workers File "build_db.py", line 109, in store_contents for pairs in tqdm(workers.imap_unordered(get_contents, files)): File "/home/HT/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 872, in iter for obj in iterable: File "/home/HT/anaconda3/lib/python3.6/multiprocessing/pool.py", line 699, in next raise value UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 98: invalid start byte

Basavaraja-MS commented 5 years ago

@hattek How you overcome that issue? I'm still facing same issue here