duck7000 / imdbGraphQLPHP

7 stars 0 forks source link

Charset and foreign characters #63

Closed GeorgeFive closed 3 months ago

GeorgeFive commented 3 months ago

This may not be related to the class at all... maybe there's some easy fix that I'm not seeing.... but I'm banging my head on my desk trying to figure this out, hah.

When getting data from imdb which contains non-ascii characters, the response seems to be encoded.

Example - https://www.imdb.com/name/nm0125367/ should be - Jörg Buttgereit returns - Jörg Buttgereit

I can get around this by doing

utf8_decode(html_entity_decode($name))

utf8_decode is deprecated, but this does work. Not a longterm fix, but it works for now.

However, this is not a fix-all for all cases! Check out:

https://www.imdb.com/title/tt14088510/ 9th credit, Suncica Milanovic. They have a role alias of Sunčica Milanović, which also is not handled properly. Running utf8_decode on this will mangle it, so that's not an option. I can, however, run

mb_convert_encoding($alias, "HTML-ENTITIES", 'UTF-8')

this on it, and that fixes it... but if I run that on the previous example (Jörg), mb will mangle THAT. So neither of these seems to be a catch-all for all characters.

Is it possible to get the class to return everything already decoded? So, the response would be Jörg Buttgereit instead of Jörg Buttgereit?

duck7000 commented 3 months ago

The response from imdb GraphQL is utf8 decoded so in theory it should work.

This is a screen dump cut from my movie program, watch the name of suncica

Screenshot_2024-06-30_21-38-14

So on my pc (Linux with browser Firefox) all those characters are properly decoded. I don't do anything with it at all, use it like i got it from GraphQL. My program does use utf8 in code like this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

So why is it on your pc/setup a problem but not on mine? Does server settings have anything to do with this?

We talked about this before and even on imdbphp some people complained about this but others don't

In other words apparently i got lucky it works for me? I have no explanation or clue for your behavior

GeorgeFive commented 3 months ago

Yeah, I'm completely at a loss here. This is the best I've come up with, I'm pretty much burnt out on this issue at this point, hah.

          if (mb_check_encoding($row['name'], "utf-8") === true) {
            echo mb_convert_encoding($row['name'], "HTML-ENTITIES", 'UTF-8');
          } else {
            echo $row['name'];
          }
GeorgeFive commented 3 months ago

This can be closed. My solution is a bit hackish, but it does work and stores all characters properly.

duck7000 commented 3 months ago

I'm sorry i can't help you, i do know this encoding stuff in general can be a real pain in the ass