Link length not properly counted in status text

elk-zone / elk

A nimble Mastodon web client

https://elk.zone

MIT License

5.42k stars 558 forks source link

Link length not properly counted in status text #1880

Open zaidhaan opened 1 year ago

zaidhaan commented 1 year ago

Not to be overly pedantic but the URL matching overshoots a bit when compared to mastodon. Specifically, in this example, mastodon would not capture the right paren ):

foo (https://news.ycombinator.com/)

Mastodon counts 29 characters (500 - 471):

Elk counts 28:

This is an issue since now it's possible to achieve a elk-safe 500 character limit and then receive a 500 character limit exceeded error by Mastodon after submitting which would confuse the user.

So somewhere down the line we might need to update this: https://github.com/elk-zone/elk/blob/3732a2cc16b4de6cd60f4836151d4a3292c627fa/components/publish/PublishWidget.vue#L72

Since it includes the ending paren:

Mastodon themselves have their URL regex defined here: https://github.com/mastodon/mastodon/blob/1ed12d/app/javascript/mastodon/features/compose/util/url_regex.js (which relies on twitter-text)

stackblitz[bot] commented 1 year ago

Start a new pull request in StackBlitz Codeflow.

patak-dev commented 1 year ago

Maybe we should use the exact same regex from Mastodon here? Including the use of twitter-text. What do you think?

zaidhaan commented 1 year ago

Hmm, yes I think that's the best solution.

(Since sure, we could modify the regex for this specific case to handle the ending by appending validUrlQueryEndingChars, but that would only address this specific issue, and would not address the underlying larger issue which is all the 23-char limit rules should apply on valid URLs, and the current simple regex doesn't understand valid URLs so even https://invalid_url_without_even_a_tld gets truncated. So it's probably best to go with twitter-text.)

I think I'll leave this for someone else to pick up since I'm not sure how to get these imports to play nice with TypeScript:

danielroe commented 1 year ago

Here's the full evaluated regexp, which I would recommend if we want to do this, rather than including the import.

https://github.com/elk-zone/elk/pull/1660#issuecomment-1419917158

zaidhaan commented 1 year ago

Thanks @danielroe, wow that's one heck of a beast, even crashed VS Code when I tried naively pasting it thinking it wouldn't exceed one line with wrapping enabled.

It feels a little dirty throwing in what seems to be a 14KiB regular expression just for truncating URLs, but I suppose if that's the best way to do it then that's what should be done.

I'll leave it to someone else to make the fix, just in case there's a better way. Crashing VS Code can't be a good sign :p

danielroe commented 1 year ago

I'm not sure the benefit is worth it. Just mentioning as it's pretty hard to see the actual regexp in the mastodon source code given that it is constructed dynamically at runtime.

patak-dev commented 1 year ago

Incredible. I agree using the full regex isn't worth it. One possible intermediate solution would be to use the regex in Mastodon as a base, but relax it until it is more manageable. It looks like a big chunk of it comes from the checking the domain is valid. That ends up bringing a harcoded lists of domains at some point for example. Maybe relaxing a few of these regexes from twitter could end up working out. I'm actually surprised Twitter and Mastodon are running this in the client 😮

ByCzech commented 1 year ago

Can confirm this bug. When URL in text is shorter than 23 chars, then isn't right counted. So for example, if URL has 19 chars, then is counted 19 instead of 23. Longer URLs are counted properly.