Automattic / jetpack

Security, performance, marketing, and design tools — Jetpack is made by WordPress experts to make WP sites safer and faster, and help you grow your traffic.
https://jetpack.com/
Other
1.59k stars 798 forks source link

Curly Quotes (and other non-UTF8 chars) break infinite scroll html content (causing html to be empty) #1446

Closed jtsternberg closed 9 years ago

jtsternberg commented 9 years ago

This issue is related to #1443. We were receiving an empty html string because when json_encode encounters an invalid character (i.e. non-UTF8), it drops the whole thing. This is obviously not ideal. It ends up losing {$posts_per_page} worth of post content. I'm looking into a way to encode a few offenders before running through json_encode, and will submit a PR to that effect.

georgestephanis commented 9 years ago

Is this breaking by like multibyte emoji stuff?

jtsternberg commented 9 years ago

Not sure about emojis, but definitely breaking on curly quotes. This post is but one example that was breaking the IS: http://crimefeed.com/2014/12/prosthetic-hand-stolen-veterans-truck/

jtsternberg commented 9 years ago

I'll be updating that PR, as it doesn't appear to have fixed the issue yet.

jtsternberg commented 9 years ago

Ok, needed to utf8_encode the html as well. This is now working as expected. The post I listed above has curly quotes in it. before this patch, Page 6 would not load the content from page 7, so the next post after 'FBI Adds Father To Top Ten Most Wanted List For Alleged “Honor Killings” Of His Teen Daughters' SHOULD BE the (custom post format quote) '“I forgive you for stealing my stuff, but just give me back my prosthetic. I mean, it’s my hand.”', BUT INSTEAD the next post is 10 posts later, 'Convicted Child Molester Wins $3 Million Lottery Prize In Florida' (missing a whole page worth of posts because of the json_encode drop).

jtsternberg commented 9 years ago

Please let me know if this is not clear or if you have questions as we'll be moving forward w/ this patch and I need to know if it will not be rolled into the next version of jetpack. Thanks @georgestephanis

georgestephanis commented 9 years ago

New 4.1 function: wp_JSON_encode solves for this. Thanks, @pento! Let's move to that, and include the fn for backcompat.

pento commented 9 years ago

:+1:

jtsternberg commented 9 years ago

Yep, wp_json_encode works just fine for this. Keep in mind you'll need to keep _wp_json_convert_string and _wp_json_sanity_check as well. I'll submit another PR.

jtsternberg commented 9 years ago

And by submit another, I mean, update the existing PR, #1447.