andrewleech / plugin.video.netflixbmc

NetfliXBMC - Unofficial Netflix Add-on (Win/OSX/Linux)
http://forum.kodi.tv/showthread.php?tid=211574
GNU General Public License v2.0
56 stars 29 forks source link

Further unicode issues #35

Open andrewleech opened 9 years ago

andrewleech commented 9 years ago

"-" in movie description is displayed as "u2013" with the "0" for some reason is with a thinner typeface.

insertnamehere1 commented 9 years ago

The Norwegian text supplied by Netflix is: \n\n\n\n\n\n\n\n\nEn filmskaper vender tilbake til det mystiske stedet der faren døde i villmarken for å avdekke galskapen \u2013 eller ondskapen \u2013 som tok livet hans.<div class=\"info\">

While the english text from the Netflix query is: A filmmaker returns to the scene of his father's mysterious death in the wilderness to uncover the madness -- or the evil -- that claimed his life.

Netflix is serving up the \u2013 in there https://www.netflix.com/JSON/BOB?movieid= response. I can mod listVideo() to replace description text "\u2013" with "--" for this special case.

andrewleech commented 9 years ago

One thing that might be worth trying, I've seen some unicode on hulu that I've had to encode with 'latin1' rather than 'utf-8' to get displaying correctly.... not sure why, it's coming from xml that explicitly says it's 'utf-8'.

insertnamehere1 commented 9 years ago

Here is a bit more information on this problem. The unicode netflix is returning contains \u005c\u0075\u0032\u0030\u0031\u0033 which is unicode for the ascii characters '\' 'u' '2' '0' '1' '3'. So when netflix response is decode("utf-8") we get the string "\u2013". It does this for the Norwegian language version of the movie description. Possibly screwed in translation? In this case the only solution I can see is to replace the string "\u2013" with the "--". Correct me if I'm wrong, would this decode this out with latin1? Also I gotta admit I'm wavering on fixing this. It's a dirty fix for a minor issue, and I think it's a netflix problem anyway.

mantheman commented 9 years ago

I've only seen this with u2013 and u2026 (a triple period), and your patch worked fine, so I at least think it's worthy of a PR =) But didn't you just replace the Unicode-in-Unicode with the real Unicode instead of "--"? u2013 is apparently an "En dash" and not strictly a double dash. http://en.wikipedia.org/wiki/Dash#En_dash

insertnamehere1 commented 9 years ago

Yeah, I replaced it with the correct unicode. Netflix replaces both u2013 and u2026 with "--". (if you compare the English and Norwegian movie descriptions) I was temped to just do what Netflix does but then what the hell, lets use the correct unicode.

mantheman commented 9 years ago

Well, Netflix show u2026 correctly as a horisontal ellipsis on both my ipad and pc. Anyways, I think you did right.