Closed GoogleCodeExporter closed 9 years ago
By the way,
I was using JWPL 0.6.
I have downgraded to JWPL 0.5b, and the problem has gone. Not sure what changes
in version 0.6 has caused this.
Original comment by ziqizhan...@googlemail.com
on 4 Jun 2011 at 4:48
Unfortunately, we are currently unable to reproduce the problem. I am not aware
of any changes that could cause this problem. Which version of MySQL Connector
are you using?
We updated hibernate from 3.2.5.ga to 3.2.7.ga - maybe this might cause the
problem?
Original comment by oliver.ferschke
on 6 Jun 2011 at 8:38
I am not using specific mysql connector, but the Java 6 generic jdbc.
It might be hibernate, but I havent been able to trace that further.
Original comment by ziqizhan...@googlemail.com
on 6 Jun 2011 at 11:05
[deleted comment]
Original comment by oliver.ferschke
on 6 Jun 2011 at 4:27
I met with the same problem.The reason is from mysql5.5.
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html . You can
choose older mysql version to solve it.
Original comment by kafka0...@gmail.com
on 9 Jun 2011 at 8:55
My understanding is that this may be the cause if you are "... upgrading to
MySQL 5.5 from an older MySQL release". But I installed ver5.5 as a fresh copy,
created and populated the database from scratch.
I suspect the cause might be in the change of hibernate in the most release of
jwpl-0.6, or a combination of this change and mysql5.5. Since the issue gets
solved when I downgrade from jwpl-0.6 to jwpl-0.5b.
Original comment by ziqizhan...@googlemail.com
on 9 Jun 2011 at 9:01
same here, moving to JWPL 0.5b worked for me too, but well that's not really a
solution, right?
My setup (which might help you guys to figure out the problem's root):
Windows 7 32-bit + MySQL Community Server (mysql-5.5.13-win32.msi) + Eclipse
3.6.2
I set the collation default charset to utf8 during the installation process.
Original comment by stephan....@googlemail.com
on 18 Jun 2011 at 4:51
This is a shot in the dark but maybe you could try setting the default
collation of the database to utf8mb4_col ?
Original comment by richard.eckart
on 18 Jun 2011 at 6:06
Original comment by richard.eckart
on 18 Jun 2011 at 6:06
Just wanted to say that this problem is still present in the newest JWPL 0.7.2
using MySQL 5.5.8.
Also I found a possible problem in the getProperties-method in the
WikiHibernateUtil-class where it says:
p.setProperty("hibernate.connection.hibernate.connection.useUnicode","true");
I have the feeling that there should only be one "hibernate.connection" and not
two.
On a related note: Can somebody please tell me how to get the charset of the
currently used hibernate-session?
Original comment by gm.squir...@gmail.com
on 19 Aug 2011 at 7:10
[deleted comment]
Ok, I found out how to get the charset-settings.
With that I run my system with JWPL 0.5b, 0.6.0 and 0.7.2 and printed charset
and collation information to the console.
0.5b:
character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results:
character_set_server: latin1
character_set_system: utf8
collation_connection: utf8_general_ci
collation_database: utf8_general_ci
collation_server: latin1_swedish_ci
0.6.6 and 0.7.2:
character_set_client: utf8mb4
character_set_connection: utf8mb4
character_set_database: utf8
character_set_filesystem: binary
character_set_results:
character_set_server: latin1
character_set_system: utf8
collation_connection: utf8mb4_general_ci
collation_database: utf8_general_ci
collation_server: latin1_swedish_ci
So we can see: my database is set to utf8 and JWPL 0.5b also uses utf8 as the
connection/client charset, but newer JWPL-versions use utf8mb4.
I hope that helps.
Original comment by gm.squir...@gmail.com
on 19 Aug 2011 at 7:47
Thank you for pointing out the problem with
p.setProperty("hibernate.connection.hibernate.connection.useUnicode","true");
This might really be the root of the problem, because JWPL 0.5 did not have
this error.
I have fixed this bug now. Are you using Maven and can you check if the current
SNAPSHOT works for you? I can also send you a fatjar with dependencies of the
current 0.8.0-SNAPSHOT with the bugfix. (just tell me where to send the file -
you can also drop me a mail for that)
Original comment by oliver.ferschke
on 19 Aug 2011 at 8:32
I'm not using Maven, sorry. I would gladly test the snapshot so please send the
fatjar to gm.squirrel@gmail.com . Thanks.
Original comment by gm.squir...@gmail.com
on 20 Aug 2011 at 11:57
Ok, I received the snapshot and tested it right away. No improvement. Still the
same error. Also the same charset and collation information in the console.
I have another idea. Looking at the stacktrace in the first post, I can see
that the exception occurs "in" the
de.tudarmstadt.ukp.wikipedia.api.Page.fetchByTitle method. That method uses a
sql-query ending with "COLLATE utf8_bin". Is that necessary? Maybe if no
special collation is set, the bug won't show.
Otherwise, maybe the hibernate-session/connection to the database has to be
somehow set explicitly to utf8.
Original comment by gm.squir...@gmail.com
on 20 Aug 2011 at 5:51
Ok, I think I found something. I connected to my local mysql-server directly in
a terminal and tried to set the variable character_set_connection. My results:
mysql> show variables like 'character_set_connection';
| character_set_connection | cp850 |
mysql> set character_set_connection="utf-8";
ERROR 1115 (42000): Unknown character set: 'utf-8'
mysql> show variables like 'character_set_connection';
| character_set_connection | cp850 |
mysql> set character_set_connection="utf8";
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_connection';
| character_set_connection | utf8 |
As I see it, MySQL doesn't understand "UTF-8". Only "UTF8", without the "-"!
But in the getProperties-method in the WikiHibernateUtil-class it is set with
the hyphen:
p.setProperty("hibernate.connection.characterEncoding", "UTF-8");
It could still be that hibernate or the JDBC-driver understands it correctly as
I am not familiar with those. But I think it would be on the safe side to use
"UTF8", not "UTF-8".
Original comment by gm.squir...@gmail.com
on 20 Aug 2011 at 6:17
And another thing. Maybe it has to do with the used JDBC-driver.
As I see it Connector/J is used, but in different versions. JWPL 0.5b uses
5.1.7, but all newer versions use 5.1.13. Now take a look at the changelog of
5.1.13, found at http://dev.mysql.com/doc/refman/5.5/en/cj-news-5-1-13.html :
"Connector/J now auto-detects servers configured with
character_set_server=utf8mb4 or treats the Java encoding utf-8 passed using
characterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when
establishing the connection."
"[...] treats the Java encoding utf-8 passed using characterEncoding=... as
utf8mb4 [...]"
Which means that Connector/J translates "utf-8" to "utf8mb4", if I'm not
mistaken. That's a pretty strong lead.
Original comment by gm.squir...@gmail.com
on 20 Aug 2011 at 6:37
Great work. I made some of the suggested changes which we should test.
Original comment by oliver.ferschke
on 20 Aug 2011 at 7:02
Original comment by oliver.ferschke
on 20 Aug 2011 at 7:02
The issue is now kind of fixed by removing the specific COLLATE setting in the
Page.fetchByTitle query. Thanks to gm.squirrel@gmail.com !
According to [1], the Connector/J automatically chooses the correct collation
for the db - this might, in case of MySQL 5.5, be "utf8mb4" or, in case of 5.1,
"utf8_bin".
By not specifying the character set in the query and letting the connector
choose the correct setting, this should be fixed for now.
I added a LIMIT 1 to ensure unique results (this was ensured before by the
COLLATE setting).
[1]
http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-configuration-prope
rties.html
Original comment by oliver.ferschke
on 20 Aug 2011 at 9:27
So, What should we do to fix the problem?
Original comment by fanmiao1...@gmail.com
on 12 Jan 2012 at 3:12
Have you tried the latest version? It was released in the last month I'd guess
it has been fixed. Before 0.8 it was fixed but you only get it by checking out
from the repository.
Original comment by ziqizhan...@googlemail.com
on 12 Jan 2012 at 8:19
OK, I reopened the issue.
We currently cannot reproduce the problem, because nobody uses a MySQL Server
5.5. here.
We thought the latest fix solved this problem - as it fixed the issue for
several people.
In the current 0.9.0-SNAPSHOT, we updated JWPL to use Hibernate 4. Maybe, this
will also solve the problem. You could try to check out the sources from the
repository and run the snapshot.
Original comment by oliver.ferschke
on 12 Jan 2012 at 9:42
Issue 70 has been merged into this issue.
Original comment by oliver.ferschke
on 12 Jan 2012 at 9:43
[deleted comment]
A week ago I download WikiDumps and create database on server version: 5.5.18
MySQL using this:
mysqladmin -u[USER] -p create [DB_NAME] DEFAULT CHARACTER SET utf8 DEFAULT
COLLATE utf8_general_ci;
I use JWPL-0.8.0 to connect with DB and when I want to create Page:
Page page = new Page(this.wikipedia, pageName);
everything is ok, because the page is created by this sql code:
Integer pageId = (Integer) session.createSQLQuery("select pml.pageID from
PageMapLine as pml where pml.name = ? LIMIT 1").setString(0,
searchString).uniqueResult();
(source code here:
http://jwpl.googlecode.com/svn/trunk/de.tudarmstadt.ukp.wikipedia.api/src/main/j
ava/de/tudarmstadt/ukp/wikipedia/api/Page.java)
But when I try to create Category:
Category category = new Category(this.wikipedia, "Mathematics");
I've got an error:
19:18:05,368 WARN JDBCExceptionReporter:77 - SQL Error: 1253, SQLState: 42000
19:18:05,368 ERROR JDBCExceptionReporter:78 - COLLATION 'utf8_bin' is not valid
for CHARACTER SET 'utf8mb4'
and the Category is created with sql code:
returnValue = session.createSQLQuery("select cat.pageId from Category as cat
where cat.name = ? COLLATE utf8_bin").setString(0, name).uniqueResult();
(source code:
http://jwpl.googlecode.com/svn/trunk/de.tudarmstadt.ukp.wikipedia.api/src/main/j
ava/de/tudarmstadt/ukp/wikipedia/api/Category.java)
So I try to create a table Category use:
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
and again import data to table but I still have the same error...
Any help, any idea?
Original comment by michal.m...@gmail.com
on 23 Jan 2012 at 6:40
We currently don't use MySQL 5.5, so we cannot reproduce the problem.
Feel free to investigate and report a patch. We'll be happy to include it in
the next release.
You furthermore might try to check out JWPL-0.9.0-SNAPSHOT. Maybe the recent
switch to hibernate 4 might have resolved the problem. (0.8.0 still uses a
pretty old version of hibernate)
Original comment by oliver.ferschke
on 24 Jan 2012 at 1:27
Issue 78 has been merged into this issue.
Original comment by oliver.ferschke
on 27 Jan 2012 at 2:56
JWPL 0.9.0 will be released soon. It uses hibernate 4.0.
Maybe this solves your problems.
Original comment by oliver.ferschke
on 30 Jan 2012 at 7:57
I just tested it with MySQL 5.5. under Win7 x64 and Ubuntu 12.04 64bit and it
worked with JWPL 0.10.0-SNAPSHOT (while showing the same error with versions
before 0.9.0.
So, this issue should be solved now.
Original comment by oliver.ferschke
on 20 Jul 2012 at 9:04
Original comment by oliver.ferschke
on 15 Aug 2012 at 9:21
Original issue reported on code.google.com by
ziqizhan...@googlemail.com
on 4 Jun 2011 at 4:28