关于step2编码问题

betterwater commented 2 years ago

最近拜读了论文，尝试运行时，step2一直报utf-8编码问题，尝试了网上大多数修改方法，仍没有解决，请问有办法破吗（悲）

def _read_tsv(cls, input_file, quotechar=None):
"""Reads a tab separated value file."""
with open(input_file, "r", encoding="utf-8") as f:
reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
lines = []
for line in reader:
if sys.version_info[0] == 2:
line = list(str(cell) for cell in line)
lines.append(line)
return lines

blhoy commented 2 years ago

可以看一下报错的信息是什么，可能是数据编码格式变了？

betterwater commented 2 years ago

可以看一下报错的信息是什么，可能是数据编码格式变了？

这是运行时候报的错。 Traceback (most recent call last): File "F:/acos/ACOS/Extract-Classify-ACOS/run_step2.py", line 351, in main() File "F:/acos/ACOS/Extract-Classify-ACOS/run_step2.py", line 174, in main eval_examples = processor.get_dev_examples(args.data_dir, args.domain_type) File "F:\acos\ACOS\Extract-Classify-ACOS\run_classifier_dataset_utils.py", line 208, in get_dev_examples self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_test_pair_1st.tsv")), "test") File "F:\acos\ACOS\Extract-Classify-ACOS\run_classifier_dataset_utils.py", line 127, in _read_tsv for line in reader: File "F:\anaconda\envs\ACOS\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 2954: invalid start byte

blhoy commented 2 years ago

这块我测试了没有这个问题，应该就是遇到解码不了的字符了，或许可以试着按不同编码另存一下输入数据文件？

betterwater commented 2 years ago

好的，我试试，麻烦你了

NUSTM / ACOS

关于step2编码问题 #9