AaronGullickson / system_generation

Generate solar system data for Battletech universe using Campaign Operations rules
2 stars 1 forks source link

Encoding issue with surname data #1

Closed AaronGullickson closed 5 years ago

AaronGullickson commented 5 years ago

There is some kind of encoding issues with some of the additional surname data added in commit 4957287ed3286a6bb187643fdb96f9db205f083b that is causing problems when the final data is loaded into MHQ. I have disabled surname sampling at the moment to allow for finishing other projects, but I will need to look at this and fix it eventually.

AaronGullickson commented 5 years ago

From the file command:

bayes:additional_surnames aarong$ file *
names_burmese.txt:   ASCII text, with CRLF line terminators
names_israeli.txt:   ISO-8859 text, with CRLF line terminators
names_jewish.txt:    ASCII text, with CRLF line terminators
names_malay.txt:     ASCII text, with CRLF line terminators
names_mongolian.txt: UTF-8 Unicode (with BOM) text, with CRLF line terminators
names_persian.txt:   ASCII text, with CRLF line terminators
names_thai.txt:      ASCII text, with CRLF line terminators

The problem appears to be names_israeli.txt which is ISO-8859.

AaronGullickson commented 5 years ago

In commit 5acada3c89933ecb5c7e1a017b04c0320191c644, I have converted the offending file to UTF8 and re-ran get_surnames.R to get a new surname dataset for sampling. I will leave this open until I have a chance to really test it out after HPG stuff is done.

AaronGullickson commented 5 years ago

I have now checked and changing the encoding seems to have resolved this issue.