File encoding issues UTF-8/others

Hi team,

first thanks for your great work, the tool is very useful.

This week, I noticed problems when supplying a CSV input file that has less-conventional special characters/umlauts as route name. I was using a CSV file that was UTF-8 encoded. My Java on Windows apparently used a different file encoding, therefore, the route names in the resulting GeoJSON file were broken and could not be easily mapped. For example, Thủ Dầu Một, Vietnam or Östrand, Sweden would become something like �\u2013strand in the geoJSON file. Programmatic mapping of inputs and outputs was hindered by that.

I was able to circumvent this by setting the file.encoding environment variable on the JVM to use the appropriate encoding/charset, e.g. in the command line:

java -jar searoute.jar -Dfile.encoding=UTF8 -i "test_input.csv" -res 5

I wanted to share this in case others stumble across this issue. It may be worth noting this in the docs or even adding it as a standard option of the JAR file.

Thanks and best, Vitus

eurostat / searoute

File encoding issues UTF-8/others #67