Open Die4Ever opened 3 months ago
I don't see how we'd be able to do this, since there are infinite numbers of variations of these urls, and different definitions as to what constitues a match, and what doesn't.
It might also be good to normalize YouTube URLs since there's a lot of different formats they can end up in and that breaks crosspost detection. Probably the only query params that need to be respected are the video ID, playlist, index, and timestamp? The si
query parameter can be thrown away, I think it's just a tracking ID for sharing.
We could handle this by resolving the post url and following redirects, eg curl -Ls -o /dev/null -w %{url_effective} "https://youtu.be/pd5iofvLrIU?si=QO2iK1Zw0Z7NDRHo"
. This gives https://www.youtube.com/watch?si=QO2iK1Zw0Z7NDRHo&v=pd5iofvLrIU&feature=youtu.be
which isnt fully normalized, but the advantage is that it works for all websites.
Otherwise we would need a Rust library to normalize Youtube urls, but I couldnt find that on crates.io.
That one would be handled when we eventually add the clearurls crate, as discussed in #4905
Requirements
Is your proposal related to a problem?
Currently if I want to search if something has been posted already, I have to get the URL an exact match, but URLs can vary in small ways.
Youtube example:
https://www.youtube.com/watch?v=pd5iofvLrIU
https://youtu.be/pd5iofvLrIU
https://youtu.be/pd5iofvLrIU?si=QO2iK1Zw0Z7NDRHo
Describe the solution you'd like.
If I could search by a part of the URL, like just
pd5iofvLrIU
, then it would find them all. I guess it would be a text index with wordstops like:
,/
,?
,=
, and&
. This could probably even happen automatically for known domains as suggested/lower search results. This could also be used to search by domain name.Also when making a post, currently it does searches by title which helps reduce reposts or get people to crosspost instead of repost, it would be cool if it also did that automatic search by the URL.
Describe alternatives you've considered.
This could maybe be a plugin?
Additional context
No response