Open zaidhaan opened 1 year ago
Start a new pull request in StackBlitz Codeflow.
Maybe we should use the exact same regex from Mastodon here? Including the use of twitter-text. What do you think?
Hmm, yes I think that's the best solution.
(Since sure, we could modify the regex for this specific case to handle the ending by appending validUrlQueryEndingChars
, but that would only address this specific issue, and would not address the underlying larger issue which is all the 23-char limit rules should apply on valid URLs, and the current simple regex doesn't understand valid URLs so even https://invalid_url_without_even_a_tld
gets truncated. So it's probably best to go with twitter-text
.)
I think I'll leave this for someone else to pick up since I'm not sure how to get these imports to play nice with TypeScript:
Here's the full evaluated regexp, which I would recommend if we want to do this, rather than including the import.
https://github.com/elk-zone/elk/pull/1660#issuecomment-1419917158
Thanks @danielroe, wow that's one heck of a beast, even crashed VS Code when I tried naively pasting it thinking it wouldn't exceed one line with wrapping enabled.
It feels a little dirty throwing in what seems to be a 14KiB regular expression just for truncating URLs, but I suppose if that's the best way to do it then that's what should be done.
I'll leave it to someone else to make the fix, just in case there's a better way. Crashing VS Code can't be a good sign :p
I'm not sure the benefit is worth it. Just mentioning as it's pretty hard to see the actual regexp in the mastodon source code given that it is constructed dynamically at runtime.
Incredible. I agree using the full regex isn't worth it. One possible intermediate solution would be to use the regex in Mastodon as a base, but relax it until it is more manageable. It looks like a big chunk of it comes from the checking the domain is valid. That ends up bringing a harcoded lists of domains at some point for example. Maybe relaxing a few of these regexes from twitter could end up working out. I'm actually surprised Twitter and Mastodon are running this in the client 😮
Can confirm this bug. When URL in text is shorter than 23 chars, then isn't right counted. So for example, if URL has 19 chars, then is counted 19 instead of 23. Longer URLs are counted properly.
Not to be overly pedantic but the URL matching overshoots a bit when compared to mastodon. Specifically, in this example, mastodon would not capture the right paren
)
:Mastodon counts 29 characters (500 - 471):
Elk counts 28:
This is an issue since now it's possible to achieve a elk-safe 500 character limit and then receive a 500 character limit exceeded error by Mastodon after submitting which would confuse the user.
So somewhere down the line we might need to update this: https://github.com/elk-zone/elk/blob/3732a2cc16b4de6cd60f4836151d4a3292c627fa/components/publish/PublishWidget.vue#L72
Since it includes the ending paren:
Mastodon themselves have their URL regex defined here: https://github.com/mastodon/mastodon/blob/1ed12d/app/javascript/mastodon/features/compose/util/url_regex.js (which relies on
twitter-text
)