J-PAL / PII-Scan

R code to scan for obvious PII.
MIT License
15 stars 8 forks source link

Support UTF-8 #14

Open umeditor opened 7 years ago

umeditor commented 7 years ago

It appears that we don't correctly handle UTF-8 characters. For example, "Ã" becomes "<U+3E66623C>" for SAS files, and "ÃÂ" for Stata. I'm not sure if it's an issue with our example files or how we are outputting strings to the screen and/or the output CVS file.