akatie / mdb-sqlite

Automatically exported from code.google.com/p/mdb-sqlite
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Detect Charset Encoding (issues) #19

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
When converting the source MDB db the JVM takes into account some environment 
variables/settings. If I'm sure I need CP850 and/since my access file has some 
international characters (e.g. umlauts) I can simply call the library w/ 
java -jar -Dfile.encoding=CP850 dist/mdb-sqlite.jar [source] [target]

but nothing changes.

What is the expected output? 
I expect from the library that it either detects the (possible) charset and 
related issues (see patch) on its own or it at least allows the user to set the 
charset w/ the command line options "-D" (encoding,...).

What do you see instead?
Nothing happens when forcing a charset, nor autodetection of 
problems/charactersets is implemented.

What version of the product are you using? On what operating system?
Newest ?1.0?
*nix

Please provide any additional information below.
The patch below tries to implement both "approaches" (taking into account 
environment variables by using the setBytes() method -this could of course be 
elaborated by using more sophisticated methods,e.g. UnicodeUtils 
http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html, 
Charsets forName("UTF-8") and encode() java.nio.charset.Charset, etc.etc.etc - 
and 2° it tries to detect "strange" encoding by using the juniversalchardet).

Of course one approach could be "enough". But some detection of problems of 
this kind would be great.
The patch worked for me,but it needs of course testing w/ other charsets and 
input DBs!

Best,
Phil

Original issue reported on code.google.com by philipp....@gmail.com on 10 Dec 2011 at 6:34

Attachments: