cjdoris / ARFFFiles.jl

Load and save ARFF files
MIT License
5 stars 2 forks source link

Allow UTF-8 encoded input #5

Closed cjdoris closed 3 years ago

cjdoris commented 3 years ago

At openml.org, there are example ARFF files which appear to be UTF-8 encoded, despite the fact that the ARFF description says that this is an ASCII file format. (See #4)

Rewrite the parser to accept UTF-8 data. That is, consume an IO and parse it one Char at a time.

jbrea commented 3 years ago

I guess it is within string fields that non-ASCII characters are allowed (or just appear). OpenML dataset 379 contains emails in the text field.

cjdoris commented 3 years ago

New parser solves this.