Closed Pita closed 11 years ago
In the cases where these charachters gets replaced with two question marks, it breaks the pad :/
In mysql, each table has its own character set, and it usually defaults to latin1. This is probably causing the issue. (The charset can also be set for each connection. They should match.) The charset should probably be UTF-8. I think something like this should fix the db:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
Collation may need to be UTF8 not latin1_swedish
Try it on a dev box, this will require an ALTER *Glares at Peter..
-----Original Message----- From: Peter 'Pita' Martischka [mailto:reply@reply.github.com] Sent: 01 March 2012 17:00 To: John McLear Subject: [etherpad-lite] Non english characters are replaced with question marks sometimes (mysql) (#526)
We found out today, that some non english charcacter doesn't get saved properly to the database. Some of them get replaced by a questionmark, some by two, sometimes it works fine. This only happens if you use a mysql backend
I was able to figure out an example that always fails. Set up an etherpad lite instance with mysql, copy paste this string привет into a pad, restart etherpad lite and reload the pad. You will see that your string got replaced by questionmarks
This results in ueberDB in this sql command
REPLACE INTO `store` VALUES ('pad:test3', '{\"atext\":{\"text\":\"Welcome to Etherpad Lite!\\n\\nThis pad text is syncприветhronized as you type, приветso that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!\\n\\nEtherpad Lite on Github: http://j.mp/ep-lite\\n\\n\",\"attribs\":\"|2+r+l*0+6+m*0+6|4+4e\"},\"pool\":{\"numToAttrib\":{\"0\":[\"author\",\"a.L7J3iR4yRjzJJJVc\"]},\"nextNum\":1},\"head\":2,\"chatHead\":-1,\"publicStatus\":false,\"passwordHash\":null}'),('pad:test3:revs:2', '{\"changeset\":\"Z:6i>6|2=r=l*0+6$привет\",\"meta\":{\"author\":\"a.L7J3iR4yRjzJJJVc\",\"timestamp\":1330620479659}}');
If I paste this sql command into phpmyadmin, the special chars gets replaced with question marks too. It looks like the node mysql driver doesn't escape this chars properly, or the database scheme can not work with such chars
Any help to solve that problem is appreciated
Reply to this email directly or view it on GitHub: https://github.com/Pita/etherpad-lite/issues/526 This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of the organisation from which this email originated. If you are not the intended recipient of this email and its attachments, you must take no action based upon them, nor must you copy or show them to anyone. Please contact the sender if you believe you have received this email in error. This email was sent by School Email - Safe Webmail and Hosted Email for Schools
Maybe we can solve that without an alter. Its fascinating how much you can do wrong at a 2 column table. I should get an award for that ;)
Hah, I think I would of got more things wrong ;)
-----Original Message----- From: Peter 'Pita' Martischka [mailto:reply@reply.github.com] Sent: 01 March 2012 17:52 To: John McLear Subject: Re: [etherpad-lite] Non english characters are replaced with question marks sometimes (mysql) (#526)
Maybe we can solve that without an alter. Its fascinating how much you can do wrong at a 2 column table. I should get an award for that ;)
Reply to this email directly or view it on GitHub: https://github.com/Pita/etherpad-lite/issues/526#issuecomment-4263155 This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of the organisation from which this email originated. If you are not the intended recipient of this email and its attachments, you must take no action based upon them, nor must you copy or show them to anyone. Please contact the sender if you believe you have received this email in error. This email was sent by School Email - Safe Webmail and Hosted Email for Schools
The most interesting thing is that it sets the value correct in pad:$name:revs:$number, but not in pad:$name ... that why I'm not sure if an alter helps at all
@Pita , can you set UTF-8 as default collation in ueberDB and while you're at it, pull the redis handler (really useful)? here probably :)
This would solve this issue.
@pita can we get an update on this please?
I'm using etherpad lite (thank you guys!) in China with users who write in Korean, Chinese, and even Japanese. I'm just looking for a command to execute in mysql and any other related procedures that will solve this for my users. Could someone be any assistance?
ALTER TABLE store CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
replace store with whatever your table name is but afaik this problem is a legacy problem and shouldn't exist.
Are you running latest?
Changelog says I'm on v1.1.5. I also see in the changelog that it supports postgres now.
I get the following whenever a user types in a Korean, Chinese, or Japanese character:
[31m[2012-11-09 10:54:17.702] [ERROR] console - [39mTypeError: Cannot read property 'rev' of undefined at /home/etherpad/etherpad-lite/src/node/handler/PadMessageHandler.js:622:48 at Object.whilst (/home/etherpad/etherpad-lite/src/node_modules/async/lib/async.js:553:13) at /home/etherpad/etherpad-lite/src/node/handler/PadMessageHandler.js:621:11 at /home/etherpad/etherpad-lite/src/node_modules/async/lib/async.js:86:13 at Array.forEach (native) at /home/etherpad/etherpad-lite/src/node_modules/async/lib/async.js:26:24 at Object.forEach (/home/etherpad/etherpad-lite/src/node_modules/async/lib/async.js:85:9) at Object.updatePadClients (/home/etherpad/etherpad-lite/src/node/handler/PadMessageHandler.js:616:9) at /home/etherpad/etherpad-lite/src/node/handler/PadMessageHandler.js:598:15
Any thoughts?
@brainysmurf pls open a new issue for this ;)
Related to https://github.com/ether/etherpad-lite/issues/842 ?
Get latest (using dirty), create new pad, paste in
百度一下,你就知道
hit timeslider, click play hit back
All fine.
Change database to MySQL
create new pad, paste in
百度一下,你就知道
hit timeslider, click play hit back
All fine.
Hello. Oracle database has the same issue. From my understanding, however oracle only allows setting of such encoding only at the database level. This above solution is not feasible for us. We came up with the below solution which seems to work but we are not sure of the consequences.
in our oracle_db.js, before storing to db, we escape and encode all data as bellow.
exports.database.prototype.strencode = function(data){
return unescape( encodeURIComponent( data ) );
};
when getting it, we decode it
exports.database.prototype.strdecode = function(data){
return decodeURIComponent( escape ( data ) ) ;
};
What do you guys think?
Not taking into account the possible additional magic that encodeURIComponent performs, this might be a solution.
We found out today, that some non english charcacter doesn't get saved properly to the database. Some of them get replaced by a questionmark, some by two, sometimes it works fine. This only happens if you use a mysql backend
I was able to figure out an example that always fails. Set up an etherpad lite instance with mysql, copy paste this string привет into a pad, restart etherpad lite and reload the pad. You will see that your string got replaced by questionmarks
This results in ueberDB in this sql command
If I paste this sql command into phpmyadmin, the special chars gets replaced with question marks too. It looks like the node mysql driver doesn't escape this chars properly, or the database scheme can not work with such chars
Any help to solve that problem is appreciated