HashandSalt / kirby3-twitter

A Twitter plugin forKirby
MIT License
8 stars 4 forks source link

linkify mangles hashtags with umlauts #1

Closed MaluNoPeleke closed 3 years ago

MaluNoPeleke commented 3 years ago

it seems as if the full_text isn't encoded in UTF8: Screenshot_20210811-225035

Could you please add that to the plugin?

MaluNoPeleke commented 3 years ago

I tried it with $tweet['text'] = utf8_decode($tweet['text']) but no luck

MaluNoPeleke commented 3 years ago

One additional note: The text itself renders correct but the tags don't work the same and links are cut off when there are umlauts.

MaluNoPeleke commented 3 years ago

Maybe it could be achieved with the addslashes() and stripslashes() function for the link generation part in the Twitter.php src file!?

MaluNoPeleke commented 3 years ago

@CohortHS Your comment isn't visible here, even if I have received it via email.

S1SYPHOS commented 3 years ago

@MaluNoPeleke would you mind pointing to a test repository / setup so I could reproduce this and see if I can provide a PR?

MaluNoPeleke commented 3 years ago

Thanks a lot for your offer, really appreciated. As this would require to expose my API key and secret I rather would not like to share it. If you have this plugin already running in your environment then you could test with this snippet:

<?= snippet('twitter/tweet', ['id' => '1403092300981489673', 'media' => true])?>

Does this help you?

S1SYPHOS commented 3 years ago

I'd have to check what it takes to get one of those .. Twitter Account thingies x) but I'll manage 😀

MaluNoPeleke commented 3 years ago

That would be great, thanks a lot in advance!

S1SYPHOS commented 3 years ago

I "applied" for developer access, which makes me think why you need a developer account "just" to display some tweets elsewhere - but I'm willing to dig into this rabbit hole.

In the meantime, may I suggest Mastodon - there it's as easy as "Add application" > name, scope > "there you go, here's your access token etc"

S1SYPHOS commented 3 years ago

image

Seems to me that you might be re-encoding the string as it seems to be UTF-8 already .. since I cannot review your actual code, it's hard to say what went wrong - see above screenshot taken from a fresh plainkit.

S1SYPHOS commented 3 years ago

From what I can see, this problems roots in the linkify() department, and is not an issue of text encoding, see this test:

image

S1SYPHOS commented 3 years ago

Conclusion: As this problem clearly doesn't add up to 'missing UTF-8 support in full_text', I suggest renaming the issue or closing it and opening a new one, so people don't get confused :fox_face:

.. to something like 'linkify mangles hashtags with umlauts' or something ..

MaluNoPeleke commented 3 years ago

Thanks a lot for your investigation. I changed the title of the issue. Do you have an idea how to fix it? It seems as this has already been fixed ten years ago: https://github.com/mozilla/bleach/commit/709376b4730d668ddc3ee26299b1721a1ab86f80

S1SYPHOS commented 3 years ago

?!

what's mozilla/bleach doing here?

That's where @HashandSalt (supposedly) got the function from:

https://gist.github.com/jasny/2000705#gistcomment-3888015

I commented there, we'll see what the regex voodoo people come up with :grin: In the meantime, I'll investigate further ..

MaluNoPeleke commented 3 years ago

Looks like someone else had the same issue four years ago and didn't receive an answer: https://gist.github.com/jasny/2000705#gistcomment-2123490

HashandSalt commented 3 years ago

This linkify stuff is some crazy regex. Beyond me. Thanks @S1SYPHOS for look into this

S1SYPHOS commented 3 years ago

this regex '~(?<!\w)[@#](\w++)~' doesn't detect the ä character ...

this does: '/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u'

Source: https://stackoverflow.com/a/35498078

S1SYPHOS commented 3 years ago

test it, and behold the UNICODE

S1SYPHOS commented 3 years ago

@MaluNoPeleke

https://github.com/HashandSalt/kirby3-twitter/blob/main/src/classes/Twitter.php#L96 $value = preg_replace_callback('~(?<!\w)[@#](\w++)~' becomes $value = preg_replace_callback('/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u'', please tell me if that's better for you

S1SYPHOS commented 3 years ago

this regex '~(?<!\w)[@#](\w++)~' doesn't detect the ä character ...

this does: '/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u'

Source: https://stackoverflow.com/a/35498078

I'm no regex voodoo person, but I know a thing or two - and I know how to duck-duck-GO

S1SYPHOS commented 3 years ago

Need to test with tweets containing more than one hashtag tho

edit: yep, works fine

MaluNoPeleke commented 3 years ago

Thanks a lot but it fails for me:

syntax error, unexpected '', function ($match) use (&$li' (T_CONSTANT_ENCAPSED_STRING), expecting ')'

Maybe I copy pasted something wrong!?

S1SYPHOS commented 3 years ago

@MaluNoPeleke I guess, what about the version @HashandSalt just merged?

S1SYPHOS commented 3 years ago

I guess it'd be wise to publish a patch release, @HashandSalt what do you think?

MaluNoPeleke commented 3 years ago

It works for me as desired with composer require hashandsalt/kirby3-twitter:dev-main and I think it is fine to release it "officially". Thanks a lot @S1SYPHOS :)

S1SYPHOS commented 3 years ago

@MaluNoPeleke Es war mir ein Vergnügen, Herr Niedernolte!


Translation for @HashandSalt, you didn't learn german in school (why would anyone ..): "You're welcome, Mr. Niedernolte"