facebookresearch / vizseq

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)
https://arxiv.org/abs/1909.05424
MIT License
440 stars 61 forks source link

Support for non-ascii chars #20

Closed fuzihaofzh closed 4 years ago

fuzihaofzh commented 4 years ago

Support for non-ascii chars

Motivation

Current version cannot open non-ascii files

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Open some non-ascii files may help to test

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

kahne commented 4 years ago

Hi @fuzihaofzh , thank you for submitting the pull request!

VizSeq should already be able to parse non-ASCII characters from UTF-8 encoded text files (as demonstrated in the multilingual MT demo). Did you find any settings where it failed? If so, do you mind sharing more details? Thanks!

kahne commented 4 years ago

I will close this PR for now. Feel free to reopen it if you get any updates :-)