Open mn7216 opened 5 months ago
I've also thought about this, mostly with the goal of reducing duplicate artist entries. My best guess for the most practical way to implement the changes below would be to run the code to standardize URLs 1) on the create artist page and 2) whenever you save a given entry. That way, a script could go through and just hit save on every artist to update their links, and then there wouldn't be a need to re-run the code on every single link in the db every time it checks for duplicates (hopefully I've phrased that in a way that makes sense).
Here is a non-exhaustive list of changes I'd like to see to fix things that don't get detected as duplicates. A lot of them are for removing query strings, but there are some other odd issues I've seen as well.
NND:
Twitter:
YouTube:
SoundCloud:
BiliBili:
Pixiv:
I'll add more if I run into any.
@saturclay Thanks for the detailed examples.
We could host even more detailed domain:regex -mapping somewhere, even before the feature implementation, since it would be useful for fixing the existing links based on the datadump.
Steps for creating the domain:regex -mapping:
Links such as https://www.nicovideo.jp/user/50263010/mylist/53787559 are more complicated as those should be replaced with two new links:
Not sure if this is better as a new issue but it's also worth mentioning here as well: it would be nice to normalize Twitter/X URLs, as everything redirects to x.com now and that's what people end copying/pasting to entries. Seems like it was added as an external link match already (#1763). Not sure what the right answer is for standardizing existing entries (Wikipedia is still fighting over this) but feels like it'd be worth doing.
I've been working on this and was wondering, what language should I write this in? Would Typescript be preferred?
Also @bitbybyte I think the one concern there would be people who changed/deactivated their accounts before the switch. So if you have user xyz who has an inactive Twitter link, changing it to X would make it so it no longer goes to the right archived page. Of course, we could just not run this on links marked as inactive, but then there's still the concern that there could be links that are inactive, but aren't yet marked as inactive. I think it's a good idea for currently active links, we'd just need to exercise some caution.
Is your feature request related to a problem? Please describe. (required)
Yes, most URLs with tracking elements or other irrelevant elements do not function with artist auto-adding.
Describe the solution you'd like. (required)
Use regular expressions to remove extraneous URL elements
Example:
sp.niconico.jp (mobile NicoNico, does not function w/autoadd or link recognition)
Example URL: https://sp.nicovideo.jp/watch/sm2154380 Include Pattern: ^https://sp\.nicovideo\.jp/(.*)$ Output: https://nicovideo.jp/$1
Niconico tracking elements can probably be removed by removing any non-numeric characters after user/ or video/ but I don't remember the structure off the top of my head
Checklist (required)
Fill out the checklist.