Automattic / jetpack

Security, performance, marketing, and design tools — Jetpack is made by WordPress experts to make WP sites safer and faster, and help you grow your traffic.
https://jetpack.com/
Other
1.59k stars 798 forks source link

Wrong OpenGraph Image URL Encoding of Non Latin Characters - In-Content Featured Image Only #38975

Open mxhassani opened 2 months ago

mxhassani commented 2 months ago

Impacted plugin

Social

Quick summary

If the featured image of a post is taken from the content (not set explicitly via the editor), and this image title contains non latin characters (Ё, ﺡ, Ω, ש, 日, न..) the URL used for OpenGraph image og:image gets encoded to Latin-1 but the file title keeps its UTF-8 encoding, resulting in a 404 and empty social media thumbnails.

The issue doesn't happen if the image is manually set as the featured image of the post - the og:image URL remains in UTF-8.

Steps to reproduce

  1. Make sure Jetpack Social is On (Jetpack > Settings > Sharing)
  2. Create a new post
  3. Take an image and change its title to something containing non-latin characters, like: تجربه.png
  4. Add that image to the content of the post
  5. Publish the post
  6. Test by checking the HTML for og:image meta tag or by sharing the post.

A clear and concise description of what you expected to happen.

The image URL encoding to remain as UTF-8 and to link to the uploaded image.

What actually happened

The Encoding is changed to Latin-1 so the image URL becomes .../تجربه.png instead of .../تجربه.png

Impact

Some (< 50%)

Available workarounds?

Yes, easy to implement

If the above answer is "Yes...", outline the workaround.

To rename all images used as thumbnails so they are in Latin characters Or To set a featured image manually on all posts Or Using a third-party tool to generate the social thumbnails like Yoast SEO

Platform (Simple and/or Atomic)

Atomic, Self-hosted

Logs or notes

It doesn't happen on Dotcom Simple sites because the image names are converted to Latin-1 then Hex on upload, affecting both the hosted file and the code.

8616758-zen

github-actions[bot] commented 2 months ago

Support References

This comment is automatically generated. Please do not edit it.

jeherve commented 2 months ago

Interestingly, I get different results depending on the platform:

Until we can figure out what is happening on the Atomic platform, I would recommend against using accented / special characters in filenames. This is bound to cause issues in one place or another. There are plugins to help you with that, like this one: https://wordpress.org/plugins/clean-image-filenames/

mxhassani commented 2 months ago

@jeherve

On self-hosted, the file name remains as it was, causing no issues.

I was able to replicate the issue on JN: https://whole-dromedary.jurassic.ninja/2024/08/20/test/ (you can register)

jeherve commented 2 months ago

I was able to replicate the issue on JN: https://whole-dromedary.jurassic.ninja/2024/08/20/test/ (you can register)

That's indeed expected, since Jurassic Ninja sites also live on the Atomic infrastructure.