Closed sergiotapia closed 12 years ago
I asked a related question on SO so other people may find a solution as well. The steps above can reproduce the problem easily.
Tried several times but can't reproduce the problem. Over here I can store 'Esta es una frase mía y me gusta!' properly in the database. Must be the database that has not been configured properly.
Can you do me a favor, in your database (assuming it's MySQL) what collation did you use? That may be the issue.
I've tried:
utf8_unicode_ci and still no dice - any other suggestions?
ensure that columns and tables are that collation too, @sergiotapia. sometimes the columns aren't that way.
Have you tried to:
This might narrow things down a bit.
I remember there has been a mysql bug concerning UTF8 mapping; we had a similar issue with a scraper once. http://mathiasbynens.be/notes/mysql-utf8mb4 however this issue should only occur in case of 4-byte unicodes (one website had a bitmap char in the text).
Btw my database uses utf8_unicode_ci as well.
@gabordemooij @sergiotapia - SELECT CREATE TABLE $table
from both of you? might be good to compare schemas to see if the fault lies within schema or DB.
also, @sergiotapia, dbms version, php version, redbean version please
Thanks so much for the assistance. Would it be better if I just passed you guys a .zip containing the rb.php file, and my own code?
It's at most 40 lines of very simple to follow code, I think that would let you easily dicern what the problem is - and also let you run it on your machine, you'll be able to see the problem first hand. Maybe I reached some edge case. :)
I really appreciate your time and help.
Here's the download link: http://dl.dropbox.com/u/6126488/scraperNoticias.zip
I think you're browser is hiding the issue; I ran this script from the command-line and the characters were already incorrectly encoded when I echoed the contents of the news object. However browsers tend to fix these things sometimes by guessing charsets. The incorrectly encoded chars may cause errors when saving to MySQL.
Aha, so the values are in fact incorrectly encoded and Firefox is just hiding that from me. I'm not that familiar with PHP, what path do you suggest I take?
I am not sure. I think the library that fetches the unicode strings does not pass them correctly. By the way. Sorry for this rather late response. I didnt receive a notification about this response from Github like I usually get.
Hey there, just wanted to let you guys know I figured out the issue. I had absolutely no idea about this so I guess Today I Learned™.
It was just a metter of encoding as you suggested. The solution wasn't obvious to me though.
Change this:
// Parse the news item's title.
foreach ($element->find('a') as $title) {
$newItem->set_title($title->innertext);
}
To this:
// Parse the news item's title.
foreach ($element->find('a') as $title) {
$newItem->set_title(iconv("ISO-8859-1", "UTF-8", $title->innertext));
}
Thanks again for your patience and help.
So the source encoding wasnt UTF8?
Yes the source encoding was ISO-8859-1
.
I'm trying to save some simple PHP objects using RedBeanPHP. It works fine, except that on string fields, it reaches a point where there is an accented vowel, ie á or 'í' and just skips the rest of the remaining characters in the string.
Example:
Here's my PHP script:
}