Mondego / SourcererCC

Sourcerer's Code Clone project
GNU General Public License v3.0
206 stars 69 forks source link

Support non-ascii languages in file-level tokenizer and fix tests #21

Closed danhper closed 6 years ago

danhper commented 6 years ago

I was getting encoding errors in the file-level tokenizer when trying to tokenize files with non ascii characters. Tests were broken so I fixed them and added a regression test.