dcnielsen90 / python-bravia-tv

MIT License
13 stars 6 forks source link

Encoding issues #14

Closed yantoz closed 3 years ago

yantoz commented 3 years ago

Program titles returned in get_playing_info calls are often not in UTF-8 encoding (depends on the station), which causes garbled text for non-ASCII characters. I did not have luck setting

response.encoding = response.apparent_encoding

in bravia_req_json before

return_value = json.loads(response.text)

but I did get correct result by changing the above line to:

return_value = json.loads(response.content)

dcnielsen90 commented 3 years ago

That makes sense. I'm away untill the weekend, I'll check it out then.

nao-pon commented 3 years ago

@dcnielsen90 I have the same problem. Is there a schedule for this fix?

dcnielsen90 commented 3 years ago

@nao-pon no schedule; I've just been busy and this slipped my mind. The issue is a super simple decoding issue that is the result of requests lib defaulting to ISO-8859-1 instead of UTF-8 when it can't determine the encoding I'm guessing this happens more often on other tvs... It's super rare on mine). The Sony api specifies utf-8, so it's likely safe to to just force it to always expect that.

I'll push something tonight after work (I'm on EST)

nao-pon commented 3 years ago

@dcnielsen90 Thank you for your response! I'm not very familiar with it, so I'm not sure, but the amendment @yantoz showed completely fixed the garbled characters. I use BRAVIA sold in Japan in Japanese.

yantoz commented 3 years ago

Just for additional info. It looks like the content is always in UTF-8, but Python mistakenly detects some as Windows-1254 instead so the garbled text. There is no encoding info in the response header, so I don't know if it is safe to assume it is always UTF-8, but if Sony specifies so, probably it is.

Some logs:

OK

requests.headers: None response.headers: {'Content-Type': 'application/json', 'Content-Length': '375', 'Connection': 'keep-alive'} response.encoding: None response.apparent_encoding: utf-8 response.text: {"result":[{"uri":"tv:isdbt?trip=32736.32736.1024&srvName=NHK総合1・東京","source":"tv:isdbt","title":"NHK総合1・東京","dispNum":"011","tripletStr":"32736.32736.1024","programTitle":"NHKニュース おはよう日本 新型コロナ 東京で急増 背景は?","startDateTime":"2020-11-11T06:00:00+0900","durationSec":3600}],"id":1} response.content: b'{"result":[{"uri":"tv:isdbt?trip=32736.32736.1024&srvName=\xef\xbc\xae\xef\xbc\xa8\xef\xbc\xab\xe7\xb7\x8f\xe5\x90\x88\xef\xbc\x91\xe3\x83\xbb\xe6\x9d\xb1\xe4\xba\xac","source":"tv:isdbt","title":"\xef\xbc\xae\xef\xbc\xa8\xef\xbc\xab\xe7\xb7\x8f\xe5\x90\x88\xef\xbc\x91\xe3\x83\xbb\xe6\x9d\xb1\xe4\xba\xac","dispNum":"011","tripletStr":"32736.32736.1024","programTitle":"\xef\xbc\xae\xef\xbc\xa8\xef\xbc\xab\xe3\x83\x8b\xe3\x83\xa5\xe3\x83\xbc\xe3\x82\xb9\xe3\x80\x80\xe3\x81\x8a\xe3\x81\xaf\xe3\x82\x88\xe3\x81\x86\xe6\x97\xa5\xe6\x9c\xac\xe3\x80\x80\xe6\x96\xb0\xe5\x9e\x8b\xe3\x82\xb3\xe3\x83\xad\xe3\x83\x8a\xe3\x80\x80\xe6\x9d\xb1\xe4\xba\xac\xe3\x81\xa7\xe6\x80\xa5\xe5\xa2\x97\xe3\x80\x80\xe8\x83\x8c\xe6\x99\xaf\xe3\x81\xaf\xef\xbc\x9f","startDateTime":"2020-11-11T06:00:00+0900","durationSec":3600}],"id":1}'

Garbled

requests.headers: None response.headers: {'Content-Type': 'application/json', 'Content-Length': '290', 'Connection': 'keep-alive'} response.encoding: None response.apparent_encoding: Windows-1254 response.text: {"result":[{"uri":"tv:isdbt?trip=32741.32741.1066&srvName=テレビ�日","source":"tv:isdbt","title":"テレビ�日","dispNum":"053","tripletStr":"32741.32741.1066","programTitle":"グッド�モーニング🈓","startDateTime":"2020-11-11T04:55:00+0900","durationSec":11100}],"id":1} response.content: b'{"result":[{"uri":"tv:isdbt?trip=32741.32741.1066&srvName=\xe3\x83\x86\xe3\x83\xac\xe3\x83\x93\xe6\x9c\x9d\xe6\x97\xa5","source":"tv:isdbt","title":"\xe3\x83\x86\xe3\x83\xac\xe3\x83\x93\xe6\x9c\x9d\xe6\x97\xa5","dispNum":"053","tripletStr":"32741.32741.1066","programTitle":"\xe3\x82\xb0\xe3\x83\x83\xe3\x83\x89\xef\xbc\x81\xe3\x83\xa2\xe3\x83\xbc\xe3\x83\x8b\xe3\x83\xb3\xe3\x82\xb0\xf0\x9f\x88\x93","startDateTime":"2020-11-11T04:55:00+0900","durationSec":11100}],"id":1}'

No change when specifying accept-charset, apparently because the returned content is already in UTF-8

requests.headers: {'Accept-Charset': 'utf-8'} response.headers: {'Content-Type': 'application/json', 'Content-Length': '290', 'Connection': 'keep-alive'} response.encoding: None response.apparent_encoding: Windows-1254 response.text: {"result":[{"uri":"tv:isdbt?trip=32741.32741.1066&srvName=テレビ�日","source":"tv:isdbt","title":"テレビ�日","dispNum":"053","tripletStr":"32741.32741.1066","programTitle":"グッド�モーニング🈓","startDateTime":"2020-11-11T04:55:00+0900","durationSec":11100}],"id":1} response.content: b'{"result":[{"uri":"tv:isdbt?trip=32741.32741.1066&srvName=\xe3\x83\x86\xe3\x83\xac\xe3\x83\x93\xe6\x9c\x9d\xe6\x97\xa5","source":"tv:isdbt","title":"\xe3\x83\x86\xe3\x83\xac\xe3\x83\x93\xe6\x9c\x9d\xe6\x97\xa5","dispNum":"053","tripletStr":"32741.32741.1066","programTitle":"\xe3\x82\xb0\xe3\x83\x83\xe3\x83\x89\xef\xbc\x81\xe3\x83\xa2\xe3\x83\xbc\xe3\x83\x8b\xe3\x83\xb3\xe3\x82\xb0\xf0\x9f\x88\x93","startDateTime":"2020-11-11T04:55:00+0900","durationSec":11100}],"id":1}'

dcnielsen90 commented 3 years ago

@yantoz interesting, it looks like requests defaults to different encoding depending on OS. I guess I could have gotten fancy here and changed the default, but I don't think we've ever seen a case where it's not bee utf-8. I'll wrap this up and make a PR in home assistant now.