Open mrog opened 4 years ago
We probably need to encode the strings with Charset.defaultCharset()
, and not UTF-8, for this to work.
I tried using Charset.defaultCharset()
and it still doesn't work.
I'm using OpenJDK 14.0.2 on macOS 10.15.6.
Interestingly, I get this message when I try UTF-16:
WARN invalid UTF-8
at transliterate (transliterate.c:791) errno: No such file or directory
Then I tried ISO-8859-1 and it worked! That solution also worked on Ubuntu. I don't have a Windows machine to test it on.
I spoke too soon. I found a different string that only works if I choose UTF-8.
Lituânia
If I select ISO-8559-1, then this string causes the library to display the invalid UTF-8
message and then freeze.
To avoid any ambiguity, here are the two strings with the Unicode characters escaped.
String stringThatRequiresChoosingUtf8 = "Litu\u00E2nia";
String stringThatRequiresChoosingIso88591 = "\u041F\u0420\u041E\u0421\u041F\u0415\u041A\u0422 \u041A\u0423\u041B\u042C\u0422\u0423\u0420\u042B";
I suppose the encoding that we need to use depends on the model used. That information must be available somewhere.
@Maurice-Betzel Would you know?
Libpostal hangs when it's asked to expand certain addresses from Java. It might be related to the use of Unicode characters in JNI. The same addresses work when the libpostal command line executable is used.
Repro steps
Run this Java code using libpostal-platform version 1.1-alpha-1.5.4.
Expected result
The code should complete.
Actual result
The call to libpostal_expand_address never returns.
Command line result
This same address can be expanded using the command line executable for libpostal.