aplmikex / deduplication_mnbvc

文本去重
MIT License
67 stars 11 forks source link

write_output_to_jsonl.py output json with unicode #5

Open xinghuang2050 opened 1 year ago

xinghuang2050 commented 1 year ago

To solve the unicode problem in output jsonl file, the following update in write_output_to_jsonl.py is recommended:

Line 27: temp_file.write(json.dumps(one_json) + '\n') modified to: temp_file.write(json.dumps(one_json, ensure_ascii=False)+'\n')

aplmikex commented 1 year ago

ok,请提个pr吧