gabordemooij / redbean

ORM layer that creates models, config and database on the fly
https://www.redbeanphp.com
2.31k stars 279 forks source link

RedBeanPHP not saving strings correctly, skips from an accented vowel onwards #184

Closed sergiotapia closed 12 years ago

sergiotapia commented 12 years ago

I'm trying to save some simple PHP objects using RedBeanPHP. It works fine, except that on string fields, it reaches a point where there is an accented vowel, ie á or 'í' and just skips the rest of the remaining characters in the string.

Example:

// Actual string in PHP script.
Esta es una frase mía y me gusta!

// Saved to database.
Esta es una frase m

Here's my PHP script:

// Setup RedBean to work with a database.
R::setup('mysql:host=localhost;dbname=noticias','root','');

foreach($parsedNews as &$tmpNews) {
    $noticia = R::dispense('noticia');
    $noticia->imagen = $tmpNews->get_image();
    $noticia->fecha = $tmpNews->get_fechanoticia();
    $noticia->titulo = $tmpNews->get_title();
    $noticia->url = $tmpNews->get_sourceurl();
    $noticia->descripcion = $tmpNews->get_description(); 
    $id = R::store($noticia);  

}

sergiotapia commented 12 years ago

I asked a related question on SO so other people may find a solution as well. The steps above can reproduce the problem easily.

sergiotapia commented 12 years ago

Related: https://groups.google.com/forum/?fromgroups#!topic/redbeanorm/bCmJ-WqHRWk

gabordemooij commented 12 years ago

Tried several times but can't reproduce the problem. Over here I can store 'Esta es una frase mía y me gusta!' properly in the database. Must be the database that has not been configured properly.

sergiotapia commented 12 years ago

Can you do me a favor, in your database (assuming it's MySQL) what collation did you use? That may be the issue.

I've tried:

utf8_unicode_ci and still no dice - any other suggestions?

katanacrimson commented 12 years ago

ensure that columns and tables are that collation too, @sergiotapia. sometimes the columns aren't that way.

gabordemooij commented 12 years ago

Have you tried to:

  1. Insert the test text using a database command line interface or phpmyadmin?
  2. Using plain PHP mysql PDO functions
  3. Check if you didn't modify the RedBeanPHP connection code or passed your own PDO connector (a PDO connection needs to be configured to use UTF8, this is done using the SET-NAMES query)

This might narrow things down a bit.

I remember there has been a mysql bug concerning UTF8 mapping; we had a similar issue with a scraper once. http://mathiasbynens.be/notes/mysql-utf8mb4 however this issue should only occur in case of 4-byte unicodes (one website had a bitmap char in the text).

gabordemooij commented 12 years ago

Btw my database uses utf8_unicode_ci as well.

katanacrimson commented 12 years ago

@gabordemooij @sergiotapia - SELECT CREATE TABLE $table from both of you? might be good to compare schemas to see if the fault lies within schema or DB.

also, @sergiotapia, dbms version, php version, redbean version please

sergiotapia commented 12 years ago

Thanks so much for the assistance. Would it be better if I just passed you guys a .zip containing the rb.php file, and my own code?

It's at most 40 lines of very simple to follow code, I think that would let you easily dicern what the problem is - and also let you run it on your machine, you'll be able to see the problem first hand. Maybe I reached some edge case. :)

I really appreciate your time and help.

Here's the download link: http://dl.dropbox.com/u/6126488/scraperNoticias.zip

gabordemooij commented 12 years ago

I think you're browser is hiding the issue; I ran this script from the command-line and the characters were already incorrectly encoded when I echoed the contents of the news object. However browsers tend to fix these things sometimes by guessing charsets. The incorrectly encoded chars may cause errors when saving to MySQL.

sergiotapia commented 12 years ago

Aha, so the values are in fact incorrectly encoded and Firefox is just hiding that from me. I'm not that familiar with PHP, what path do you suggest I take?

gabordemooij commented 12 years ago

I am not sure. I think the library that fetches the unicode strings does not pass them correctly. By the way. Sorry for this rather late response. I didnt receive a notification about this response from Github like I usually get.

sergiotapia commented 12 years ago

Hey there, just wanted to let you guys know I figured out the issue. I had absolutely no idea about this so I guess Today I Learned™.

It was just a metter of encoding as you suggested. The solution wasn't obvious to me though.

Change this:

// Parse the news item's title.
foreach ($element->find('a') as $title) {
    $newItem->set_title($title->innertext);
}

To this:

// Parse the news item's title.
foreach ($element->find('a') as $title) {
    $newItem->set_title(iconv("ISO-8859-1", "UTF-8", $title->innertext));    
}

Thanks again for your patience and help.

gabordemooij commented 12 years ago

So the source encoding wasnt UTF8?

sergiotapia commented 12 years ago

Yes the source encoding was ISO-8859-1.