aaronpk / webmention.io

Easily enable webmentions on any web site
https://webmention.io/
Other
774 stars 38 forks source link

Emoji being replaced with ??? in webmentions #203

Open johnpeart opened 10 months ago

johnpeart commented 10 months ago

Issue

Webmentions sent to the www.webmention.io service containing emoji – like πŸ™ and 🐒 – are sometimes replaced with ??? and ???? when www.webmention.io sends the Webmention to its destination site.

Expected behaviour

The Webmention should include the original emoji.

Further info

This may be a new bug; emoji were being sent via the service successfully in the past. When I integrated with the www.webmention.io service several weeks ago, emoji were not being replaced. This blog post from my personal site shows a Webmention that successfully includes the emoji. This Webmention originates from a Mastodon post, piped in via Brid.gy and works as expected.

This more recent blogpost, which features a range of Webmentions from people’s personal blogs and from Mastodon posts piped in via Brid.gy all have their emojis stripped and replaced with β€˜???’ or β€˜????’.

Looking at the raw data from the www.webmention.io API shows that the emoji are not present in the most recent API data (but are on the older posts). Here’s a link to the web mentions page for my site, in case needed to demonstrate.

aaronpk commented 10 months ago

This is really strange, thanks for the details. I'm trying to track down when this started happening. It seems to me that anything before August has emoji, but after August that's not the case. I did do a bunch of maintenance work in August on this, in particular to make it accept Emoji in a URL itself. That's bizarre that it would have broken storing emoji tho! I will investigate more.

johnpeart commented 10 months ago

Thanks Aaron. πŸ™‚

I did try skimming through the previous commits to see if I could identify anything obviously changed, but I think it's beyond my amateurish coding skills!

aaronpk commented 10 months ago

The only related commit I can figure out is this one, which sets a property on the database connection:

https://github.com/aaronpk/webmention.io/commit/1df6373363071c2d93e85d4d20b47fbe41424ed6

aaronpk commented 10 months ago

I reverted that commit and that seems to be storing emoji properly now. Thanks for catching that. I'm assuming there is now something else broken with emoji in URLs that this was supposed to fix, but that happens much less often than emoji in post contents so we'll deal with that separately later.

rknightuk commented 10 months ago

This appears to still be happening even for mentions created since the revert of that commit. Example from one on my site here:

{
      "type": "entry",
      "author": {
        "type": "card",
        "name": "Robb Knight",
        "photo": "https://webmention.io/avatar/media.social.lol/d2b5943b2e687ef31399f8241bd07d88b1140716a89c5e026038eaf8ec5341b7.jpg",
        "url": "https://social.lol/@robb"
      },
      "url": "https://social.lol/@robb/111416365374993541",
      "published": "2023-11-15T20:06:57+00:00",
      "wm-received": "2023-11-15T20:27:01Z",
      "wm-id": 1738813,
      "wm-source": "https://brid.gy/comment/mastodon/@robb@social.lol/111415917630210220/111416365374993541",
      "wm-target": "https://rknight.me/using-the-johnny-decimal-system/",
      "wm-protocol": "webmention",
      "content": {
        "html": "<p><span class=\"h-card\"><a href=\"https://hachyderm.io/@johnnydecimal\" class=\"u-url\">@<span>johnnydecimal</span></a></span> No better way to find a typo than by publishing the post ????</p><p>Will be fixed in the next few minutes.</p>",
        "text": "@johnnydecimal No better way to find a typo than by publishing the post ????Will be fixed in the next few minutes."
      },
      "in-reply-to": "https://rknight.me/using-the-johnny-decimal-system/",
      "wm-property": "in-reply-to",
      "wm-private": false
    }
ahmadalbakri commented 8 months ago

I can confirm this issue still happen, at least on my site https://ahmad.build/shortupdate-25-12-2023/#webmentions

JoelOtter commented 7 months ago

Happening for me here too: https://www.joelotter.com/notes/2024/02/05-pacific-drive/

Interestingly it's happening for author names as well as the post content. @aaronpk could this be reopened?

jphastings commented 5 months ago

This is happening for me too β€” is there anything I can provide from the places where I'm seeing it that'd make it easier to debug?

kristofzerbe commented 3 weeks ago

Same here ... Emojis in content.txt and .html, coming from brid.gy are replaced by questionmarks:

  1. Mastodon: https://indieweb.social/@kiko/112779603915753548 image

  2. Bridgy: image

  3. Webmention.io API: https://webmention.io/api/mentions.html?target=https%3A%2F%2Fkiko.io%2Fnotes%2F2024%2FBlogroll-Feed-Heavyweight-Championship%2F image