eurostat / searoute

Compute shortest maritime routes between ports
European Union Public License 1.2
157 stars 33 forks source link

File encoding issues UTF-8/others #67

Open vituslehner opened 1 year ago

vituslehner commented 1 year ago

Hi team,

first thanks for your great work, the tool is very useful.

This week, I noticed problems when supplying a CSV input file that has less-conventional special characters/umlauts as route name. I was using a CSV file that was UTF-8 encoded. My Java on Windows apparently used a different file encoding, therefore, the route names in the resulting GeoJSON file were broken and could not be easily mapped. For example, Thủ Dầu Một, Vietnam or Östrand, Sweden would become something like �\u2013strand in the geoJSON file. Programmatic mapping of inputs and outputs was hindered by that.

I was able to circumvent this by setting the file.encoding environment variable on the JVM to use the appropriate encoding/charset, e.g. in the command line:

java -jar searoute.jar -Dfile.encoding=UTF8 -i "test_input.csv" -res 5

I wanted to share this in case others stumble across this issue. It may be worth noting this in the docs or even adding it as a standard option of the JAR file.

Thanks and best, Vitus