What steps will reproduce the problem?
When converting the source MDB db the JVM takes into account some environment
variables/settings. If I'm sure I need CP850 and/since my access file has some
international characters (e.g. umlauts) I can simply call the library w/
java -jar -Dfile.encoding=CP850 dist/mdb-sqlite.jar [source] [target]
but nothing changes.
What is the expected output?
I expect from the library that it either detects the (possible) charset and
related issues (see patch) on its own or it at least allows the user to set the
charset w/ the command line options "-D" (encoding,...).
What do you see instead?
Nothing happens when forcing a charset, nor autodetection of
problems/charactersets is implemented.
What version of the product are you using? On what operating system?
Newest ?1.0?
*nix
Please provide any additional information below.
The patch below tries to implement both "approaches" (taking into account
environment variables by using the setBytes() method -this could of course be
elaborated by using more sophisticated methods,e.g. UnicodeUtils
http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html,
Charsets forName("UTF-8") and encode() java.nio.charset.Charset, etc.etc.etc -
and 2° it tries to detect "strange" encoding by using the juniversalchardet).
Of course one approach could be "enough". But some detection of problems of
this kind would be great.
The patch worked for me,but it needs of course testing w/ other charsets and
input DBs!
Best,
Phil
Original issue reported on code.google.com by philipp....@gmail.com on 10 Dec 2011 at 6:34
Original issue reported on code.google.com by
philipp....@gmail.com
on 10 Dec 2011 at 6:34Attachments: