eccsup / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 0 forks source link

CharacterSet - SEVERE: COLLATION 'utf8_bin' is not valid for CHARACTER SET 'utf8mb4' #23

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, I am using Mysql 5.5 community server 64bit version for Win 7 64bit OS.

I have successfully imported a fresh wikipedia 2011 db, then on my first query, 
as:

>>>
DatabaseConfiguration dbConfig = new DatabaseConfiguration();
dbConfig.setHost("localhost");
........//other config
Wikipedia wiki = new Wikipedia(dbConfig);
System.out.println(wiki.getPage("Cat").getTitle());
>>>

I get an exception:
>>>
SEVERE: COLLATION 'utf8_bin' is not valid for CHARACTER SET 'utf8mb4'
Exception in thread "main" org.hibernate.exception.SQLGrammarException: could 
not execute query
    at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:67)
    at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
    at org.hibernate.loader.Loader.doList(Loader.java:2223)
    at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
    at org.hibernate.loader.Loader.list(Loader.java:2099)
    at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:289)
    at org.hibernate.impl.SessionImpl.listCustomQuery(SessionImpl.java:1695)
    at org.hibernate.impl.AbstractSessionImpl.list(AbstractSessionImpl.java:142)
    at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:152)
    at org.hibernate.impl.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:811)
    at de.tudarmstadt.ukp.wikipedia.api.Page.fetchByTitle(Page.java:153)
    at de.tudarmstadt.ukp.wikipedia.api.Page.<init>(Page.java:109)
    at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.getPage(Wikipedia.java:112)
    at uk.ac.shef.oak.jwpltest.Test.main(Test.java:23)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: COLLATION 
'utf8_bin' is not valid for CHARACTER SET 'utf8mb4'
>>>

I have double checked my database's character set, which shows:

>>>
character-set-database = utf8
collation-database = utf8_general_ci
>>>

It seems that the DB character set config is right, but the error seems to 
complain about that. Any suggestions please?

Thanks

Original issue reported on code.google.com by ziqizhan...@googlemail.com on 4 Jun 2011 at 4:28

GoogleCodeExporter commented 9 years ago
By the way, 

I was using JWPL 0.6.

I have downgraded to JWPL 0.5b, and the problem has gone. Not sure what changes 
in version 0.6 has caused this.

Original comment by ziqizhan...@googlemail.com on 4 Jun 2011 at 4:48

GoogleCodeExporter commented 9 years ago
Unfortunately, we are currently unable to reproduce the problem. I am not aware 
of any changes that could cause this problem. Which version of MySQL Connector 
are you using? 
We updated hibernate from 3.2.5.ga to 3.2.7.ga - maybe this might cause the 
problem?

Original comment by oliver.ferschke on 6 Jun 2011 at 8:38

GoogleCodeExporter commented 9 years ago
I am not using specific mysql connector, but the Java 6 generic jdbc. 
It might be hibernate, but I havent been able to trace that further.

Original comment by ziqizhan...@googlemail.com on 6 Jun 2011 at 11:05

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 6 Jun 2011 at 4:27

GoogleCodeExporter commented 9 years ago
I met with the same problem.The reason is from mysql5.5. 
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html . You can 
choose older mysql version to solve it.

Original comment by kafka0...@gmail.com on 9 Jun 2011 at 8:55

GoogleCodeExporter commented 9 years ago
My understanding is that this may be the cause if you are "... upgrading to 
MySQL 5.5 from an older MySQL release". But I installed ver5.5 as a fresh copy, 
created and populated the database from scratch.

I suspect the cause might be in the change of hibernate in the most release of 
jwpl-0.6, or a combination of this change and mysql5.5. Since the issue gets 
solved when I downgrade from jwpl-0.6 to jwpl-0.5b.

Original comment by ziqizhan...@googlemail.com on 9 Jun 2011 at 9:01

GoogleCodeExporter commented 9 years ago
same here, moving to JWPL 0.5b worked for me too, but well that's not really a 
solution, right?

My setup (which might help you guys to figure out the problem's root):
Windows 7 32-bit + MySQL Community Server (mysql-5.5.13-win32.msi) + Eclipse 
3.6.2
I set the collation default charset to utf8 during the installation process.

Original comment by stephan....@googlemail.com on 18 Jun 2011 at 4:51

GoogleCodeExporter commented 9 years ago
This is a shot in the dark but maybe you could try setting the default 
collation of the database to utf8mb4_col ?

Original comment by richard.eckart on 18 Jun 2011 at 6:06

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 18 Jun 2011 at 6:06

GoogleCodeExporter commented 9 years ago
Just wanted to say that this problem is still present in the newest JWPL 0.7.2 
using MySQL 5.5.8.
Also I found a possible problem in the getProperties-method in the 
WikiHibernateUtil-class where it says:

p.setProperty("hibernate.connection.hibernate.connection.useUnicode","true");

I have the feeling that there should only be one "hibernate.connection" and not 
two.

On a related note: Can somebody please tell me how to get the charset of the 
currently used hibernate-session?

Original comment by gm.squir...@gmail.com on 19 Aug 2011 at 7:10

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Ok, I found out how to get the charset-settings.

With that I run my system with JWPL 0.5b, 0.6.0 and 0.7.2 and printed charset 
and collation information to the console.

0.5b:
character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results: 
character_set_server: latin1
character_set_system: utf8
collation_connection: utf8_general_ci
collation_database: utf8_general_ci
collation_server: latin1_swedish_ci

0.6.6 and 0.7.2:
character_set_client: utf8mb4
character_set_connection: utf8mb4
character_set_database: utf8
character_set_filesystem: binary
character_set_results: 
character_set_server: latin1
character_set_system: utf8
collation_connection: utf8mb4_general_ci
collation_database: utf8_general_ci
collation_server: latin1_swedish_ci

So we can see: my database is set to utf8 and JWPL 0.5b also uses utf8 as the 
connection/client charset, but newer JWPL-versions use utf8mb4.

I hope that helps.

Original comment by gm.squir...@gmail.com on 19 Aug 2011 at 7:47

GoogleCodeExporter commented 9 years ago
Thank you for pointing out the problem with
p.setProperty("hibernate.connection.hibernate.connection.useUnicode","true");
This might really be the root of the problem, because JWPL 0.5 did not have 
this error.

I have fixed this bug now. Are you using Maven and can you check if the current 
SNAPSHOT works for you? I can also send you a fatjar with dependencies of the 
current 0.8.0-SNAPSHOT with the bugfix. (just tell me where to send the file - 
you can also drop me a mail for that)

Original comment by oliver.ferschke on 19 Aug 2011 at 8:32

GoogleCodeExporter commented 9 years ago
I'm not using Maven, sorry. I would gladly test the snapshot so please send the 
fatjar to gm.squirrel@gmail.com . Thanks.

Original comment by gm.squir...@gmail.com on 20 Aug 2011 at 11:57

GoogleCodeExporter commented 9 years ago
Ok, I received the snapshot and tested it right away. No improvement. Still the 
same error. Also the same charset and collation information in the console.

I have another idea. Looking at the stacktrace in the first post, I can see 
that the exception occurs "in" the 
de.tudarmstadt.ukp.wikipedia.api.Page.fetchByTitle method. That method uses a 
sql-query ending with "COLLATE utf8_bin". Is that necessary? Maybe if no 
special collation is set, the bug won't show.
Otherwise, maybe the hibernate-session/connection to the database has to be 
somehow set explicitly to utf8. 

Original comment by gm.squir...@gmail.com on 20 Aug 2011 at 5:51

GoogleCodeExporter commented 9 years ago
Ok, I think I found something. I connected to my local mysql-server directly in 
a terminal and tried to set the variable character_set_connection. My results:

mysql> show variables like 'character_set_connection';
| character_set_connection | cp850 |

mysql> set character_set_connection="utf-8";
ERROR 1115 (42000): Unknown character set: 'utf-8'

mysql> show variables like 'character_set_connection';
| character_set_connection | cp850 |

mysql> set character_set_connection="utf8";
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'character_set_connection';
| character_set_connection | utf8  |

As I see it, MySQL doesn't understand "UTF-8". Only "UTF8", without the "-"!
But in the getProperties-method in the WikiHibernateUtil-class it is set with 
the hyphen:
p.setProperty("hibernate.connection.characterEncoding", "UTF-8");

It could still be that hibernate or the JDBC-driver understands it correctly as 
I am not familiar with those. But I think it would be on the safe side to use 
"UTF8", not "UTF-8".

Original comment by gm.squir...@gmail.com on 20 Aug 2011 at 6:17

GoogleCodeExporter commented 9 years ago
And another thing. Maybe it has to do with the used JDBC-driver.
As I see it Connector/J is used, but in different versions. JWPL 0.5b uses 
5.1.7, but all newer versions use 5.1.13. Now take a look at the changelog of 
5.1.13, found at http://dev.mysql.com/doc/refman/5.5/en/cj-news-5-1-13.html :

"Connector/J now auto-detects servers configured with 
character_set_server=utf8mb4 or treats the Java encoding utf-8 passed using 
characterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when 
establishing the connection."

"[...] treats the Java encoding utf-8 passed using characterEncoding=... as 
utf8mb4 [...]"

Which means that Connector/J translates "utf-8" to "utf8mb4", if I'm not 
mistaken. That's a pretty strong lead.

Original comment by gm.squir...@gmail.com on 20 Aug 2011 at 6:37

GoogleCodeExporter commented 9 years ago
Great work. I made some of the suggested changes which we should test.

Original comment by oliver.ferschke on 20 Aug 2011 at 7:02

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 20 Aug 2011 at 7:02

GoogleCodeExporter commented 9 years ago
The issue is now kind of fixed by removing the specific COLLATE setting in the 
Page.fetchByTitle query. Thanks to gm.squirrel@gmail.com !

According to [1], the Connector/J automatically chooses the correct collation 
for the db - this might, in case of MySQL 5.5, be "utf8mb4" or, in case of 5.1, 
"utf8_bin".
By not specifying the character set in the query and letting the connector 
choose the correct setting, this should be fixed for now. 
I added a LIMIT 1 to ensure unique results (this was ensured before by the 
COLLATE setting).

[1] 
http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-configuration-prope
rties.html 

Original comment by oliver.ferschke on 20 Aug 2011 at 9:27

GoogleCodeExporter commented 9 years ago
So, What should we do to fix the problem?

Original comment by fanmiao1...@gmail.com on 12 Jan 2012 at 3:12

GoogleCodeExporter commented 9 years ago
Have you tried the latest version? It was released in the last month I'd guess 
it has been fixed. Before 0.8 it was fixed but you only get it by checking out 
from the repository.

Original comment by ziqizhan...@googlemail.com on 12 Jan 2012 at 8:19

GoogleCodeExporter commented 9 years ago
OK, I reopened the issue.
We currently cannot reproduce the problem, because nobody uses a MySQL Server 
5.5. here.
We thought the latest fix solved this problem - as it fixed the issue for 
several people.
In the current 0.9.0-SNAPSHOT, we updated JWPL to use Hibernate 4. Maybe, this 
will also solve the problem. You could try to check out the sources from the 
repository and run the snapshot. 

Original comment by oliver.ferschke on 12 Jan 2012 at 9:42

GoogleCodeExporter commented 9 years ago
Issue 70 has been merged into this issue.

Original comment by oliver.ferschke on 12 Jan 2012 at 9:43

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
A week ago I download WikiDumps and create database on server version: 5.5.18 
MySQL using this:
mysqladmin -u[USER] -p create [DB_NAME] DEFAULT CHARACTER SET utf8 DEFAULT 
COLLATE utf8_general_ci;
I use JWPL-0.8.0 to connect with DB and when I want to create Page:
Page page = new Page(this.wikipedia, pageName);
everything is ok, because the page is created by this sql code:
Integer pageId = (Integer) session.createSQLQuery("select pml.pageID from 
PageMapLine as pml where pml.name = ? LIMIT 1").setString(0, 
searchString).uniqueResult();
(source code here: 
http://jwpl.googlecode.com/svn/trunk/de.tudarmstadt.ukp.wikipedia.api/src/main/j
ava/de/tudarmstadt/ukp/wikipedia/api/Page.java)

But when I try to create Category:
Category category = new Category(this.wikipedia, "Mathematics");
I've got an error:
19:18:05,368  WARN JDBCExceptionReporter:77 - SQL Error: 1253, SQLState: 42000
19:18:05,368 ERROR JDBCExceptionReporter:78 - COLLATION 'utf8_bin' is not valid 
for CHARACTER SET 'utf8mb4'
and the Category is created with sql code:
returnValue = session.createSQLQuery("select cat.pageId from Category as cat 
where cat.name = ? COLLATE utf8_bin").setString(0, name).uniqueResult();
(source code: 
http://jwpl.googlecode.com/svn/trunk/de.tudarmstadt.ukp.wikipedia.api/src/main/j
ava/de/tudarmstadt/ukp/wikipedia/api/Category.java)

So I try to create a table Category use:
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
and again import data to table but I still have the same error...

Any help, any idea?

Original comment by michal.m...@gmail.com on 23 Jan 2012 at 6:40

GoogleCodeExporter commented 9 years ago
We currently don't use MySQL 5.5, so we cannot reproduce the problem.
Feel free to investigate and report a patch. We'll be happy to include it in 
the next release.
You furthermore might try to check out JWPL-0.9.0-SNAPSHOT. Maybe the recent 
switch to hibernate 4 might have resolved the problem. (0.8.0 still uses a 
pretty old version of hibernate) 

Original comment by oliver.ferschke on 24 Jan 2012 at 1:27

GoogleCodeExporter commented 9 years ago
Issue 78 has been merged into this issue.

Original comment by oliver.ferschke on 27 Jan 2012 at 2:56

GoogleCodeExporter commented 9 years ago
JWPL 0.9.0 will be released soon. It uses hibernate 4.0.
Maybe this solves your problems.

Original comment by oliver.ferschke on 30 Jan 2012 at 7:57

GoogleCodeExporter commented 9 years ago
I just tested it with MySQL 5.5. under Win7 x64 and Ubuntu 12.04 64bit and it 
worked with JWPL 0.10.0-SNAPSHOT (while showing the same error with versions 
before 0.9.0.
So, this issue should be solved now.

Original comment by oliver.ferschke on 20 Jul 2012 at 9:04

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 15 Aug 2012 at 9:21