NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.84k stars 897 forks source link

UnicodeDecodeError while installing dataset. #67

Closed gr8Adakron closed 6 years ago

gr8Adakron commented 6 years ago

Traceback (most recent call last): File "filter_query.py", line 33, in for idx,line in enumerate(open(in_corpfile[i], 'r')): File "/home/afzal/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2052: ordinal not in range(128) Traceback (most recent call last): File "transfer_to_mz_format.py", line 15, in for line in open(infile, 'r'): File "/home/afzal/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 3419: ordinal not in range(128) Traceback (most recent call last): File "prepare_mz_data.py", line 47, in corpus, rel_train, rel_valid, rel_test = prepare.run_with_train_valid_test_corpus(infiles[0], infiles[1], infiles[2]) File "../../matchzoo/inputs/preparation.py", line 107, in run_with_train_valid_test_corpus f = codecs.open(file_path, 'r', encoding='utf8') File "/home/afzal/anaconda3/lib/python3.6/codecs.py", line 895, in open file = builtins.open(filename, mode, buffering) FileNotFoundError: [Errno 2] No such file or directory: './WikiQA-mz-dev.txt' load word dict ...

While running this command bash run_data.sh In directory MatchZoo/data/WikiQA

Any help? please? as I need to download data for executing the algorithm on sample dataset.

rgtjf commented 6 years ago

Maybe you could try codecs.open(file_name, 'r', encoding='utf8') to replace open(file_name, 'r').

P.S. when you run run_data.sh, you can execute one command at a time, since it would be easy to find the problems.