duckduckgo / p5-app-duckpan

DuckDuckHack OpenSource Development Application
http://metacpan.org/module/App::DuckPAN
Other
53 stars 47 forks source link

utf8 issue in html return (maybe more) #21

Closed yegg closed 11 years ago

yegg commented 11 years ago

When utf8 is being returned, at least in html for a goodie as in https://github.com/duckduckgo/zeroclickinfo-goodies/pull/208, it is being encoded incorrectly.

I verified by uri encoding the return string and decoded it independently to verify it was encoded correctly, so it gets messed up somewhere after that.

yegg commented 11 years ago

Some update:

--Happens with 'answer' too --It is already encoded wrong when it gets to https://github.com/duckduckgo/p5-app-duckpan/blob/master/lib/App/DuckPAN/Web.pm#L329 -- you can see it right in the source from duckpan server printed wrong on the page.

moollaza commented 11 years ago

Fixed with 02d751fa7456795c76f431875548af750bdbd88f

nilnilnil commented 11 years ago

Pr?

On Sunday, October 6, 2013, Zaahir Moolla wrote:

Fixed with 02d751fhttps://github.com/duckduckgo/p5-app-duckpan/commit/02d751fa7456795c76f431875548af750bdbd88f

— Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/p5-app-duckpan/issues/21#issuecomment-25781905 .

(phone)

moollaza commented 11 years ago

No PR, @yegg made a hotfix. We discussed with @nospampleasemam on HipChat

nilnilnil commented 11 years ago

Cool -- sounds good.

On Sunday, October 6, 2013, Zaahir Moolla wrote:

No PR, @yegg https://github.com/yegg made a hotfix. We discussed with @nospampleasemam https://github.com/nospampleasemam on HipChat

— Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/p5-app-duckpan/issues/21#issuecomment-25781975 .

(phone)

moollaza commented 11 years ago

Hmm it looks like there is still some sort of UTF-8 bug to fix. After testing more with this commit applied I was able to recreate the initial issue that Encode::_utf8_off($body) solved, which @jagtalon described last year: https://github.com/duckduckgo/zeroclickinfo-spice/pull/34#issuecomment-5887597

If you open duckpan server and try the query "g+ duckduckgo" you'll get a 500 from Google's API and get the same error as before: http://dl.dropboxusercontent.com/u/1358088/error.html

It seems we need to further investigate this issue...

majuscule commented 11 years ago

I think I got to the bottom of it. Plack wants a bytestream, and so the utf8 api_response needs to be encoded before being pushed through the server. PR https://github.com/duckduckgo/p5-app-duckpan/pull/24.

Also worth noting is that we have two other occurences of Encode::_utf8_off in WebStatic and WebPublisher. Probably worth looking into to avoid.