evazion / translate-pixiv-tags

MIT License
35 stars 6 forks source link

Use better sorting for artist urls in the tooltip #72

Closed hdk5 closed 1 year ago

hdk5 commented 1 year ago

i.e. similar to https://github.com/danbooru/danbooru/commit/7bbe6e9d228985cd80026a9a0e1ed3362c77de7b

7nik commented 1 year ago
  1. It's better to rename SITES to SITE_ORDER
  2. there seems to be a "dead" code, e.g. it checks for "Adobe Portfolio" links, but it isn't presented in SITES and there is no sorting just by site name. Also, some domains and hosts, like pximg.net and o.twimg.com, seem not to be used in artist urls or normalized to another domain.
  3. it looks like most cases just check for the host or domain and I think they all can be moved to an object-map, like
    const SITENAME_MAP = {
    'pximg.net': "Pixiv",
    'pixiv.net': "Pixiv", 
    'pixiv.me': "Pixiv",
    'pixiv.cc': "Pixiv",
    "booth.pm": "Booth",
    "booth.pximg.net": "Booth",
    ...

    or move them directly to SITES:

    
    const `SITES` = [
    // Pixiv
    'pximg.net', 'pixiv.net', 'pixiv.me', 'pixiv.cc',
    // Twitter
    'twitter.com', 'twimg.com', 't.co',
    // Anifty
    "anifty.jp", "anifty.imgix.net", { host: "storage.googleapis.com", path: "/anifty-media/" },
    // ArtStation
    "artstation.com",
    ...
    // Fanbox
    'fanbox.cc', 'fanbox.pixiv.net',  { domain: 'pixiv.net', path: "/fanbox/" },  { domain: 'pximg.net', path: "/fanbox/" },
    ];

function getSitePriority (siteUrl) { let { host, pathname } = new URL(siteUrl); let { domain, subdomain } = psl.parse(host); // there can be multiple matches for sites like pixiv.net and pximg.net if (["pixiv.net", "pximg.net"].includes(domain) && !["fanbox", "sketch", "img-sketch", "booth"].includes(subdomain) && !pathname.startsWith("/fanbox/") ) { return 1; } if (host === 'o.twimg.com') domain = 'twitpic.com'; const index = SITES.findIndex((site) => { if (typeof site === "string") { return site === host || site === domain; } return (!site.host || site.host === host) && (!site.domain || site.domain === domain) && (!site.path || pathname.includes(site.path)); }); return index < 0 ? 1000 : index; }

hdk5 commented 1 year ago

@7nik

  1. It's better to rename SITES to SITE_ORDER

Ok.

  1. there seems to be a "dead" code, e.g. it checks for "Adobe Portfolio" links, but it isn't presented in SITES and there is no sorting just by site name. Also, some domains and hosts, like pximg.net and o.twimg.com, seem not to be used in artist urls or normalized to another domain.

I just took all the rules straight from danbooru, and don't really feel like inspecting which are actually used for the artist urls, so...

  1. it looks like most cases just check for the host or domain and I think they all can be moved to an object-map, like

Ok.

7nik commented 1 year ago

After matching the list with SITE_ORDER and checking with the https://danbooru.donmai.us/artist_urls.json?search%5Burl_matches%5D= endpoint, the list became almost four times shorter.

the list
[
    { name: "Fanbox", domain: "fanbox.cc" },
    { name: "Fanbox", domain: "pixiv.net", pathname: "/fanbox/" },
    { name: "Pixiv Sketch", host: "sketch.pixiv.net" },
    { name: "Booth", domain: "booth.pm" },
    { name: "Pixiv", domain: "pixiv.net" },
    { name: "Pixiv", domain: "pixiv.me" },
    { name: "Pixiv", domain: "pixiv.cc" },
    { name: "Twitter", domain: "twitter.com" },
    { name: "Twitter", domain: "t.co" },
    { name: "Anifty", domain: "anifty.jp" },
    { name: "ArtStation", domain: "artstation.com" },
    { name: "Bilibili", domain: "bilibili.com" },
    { name: "Deviant Art", domain: "deviantart.com" },
    { name: "Deviant Art", domain: "fav.me" },
    { name: "Deviant Art", domain: "sta.sh" },
    { name: "Fantia", domain: "fantia.jp" },
    { name: "FC2", domain: "fc2.com" },
    { name: "FC2", domain: "fc2blog.net" },
    { name: "FC2", domain: "fc2blog.us" },
    { name: "Foundation", domain: "foundation.app" },
    { name: "Furaffinity", domain: "furaffinity.net" },
    { name: "Gumroad", domain: "gumroad.com" },
    { name: "Gumroad", domain: "gum.co" },
    { name: "Hentai Foundry", domain: "hentai-foundry.com" },
    { name: "Instagram", domain: "instagram.com" },
    { name: "Lofter", domain: "lofter.com" },
    { name: "Lofter", domain: "127.net" },
    { name: "Pawoo", domain: "pawoo.net" },
    { name: "Baraag", domain: "baraag.net" },
    { name: "Misskey.io", domain: "misskey.io" },
    { name: "Misskey.art", domain: "misskey.art" },
    { name: "Misskey.design", domain: "misskey.design" },
    { name: "Newgrounds", domain: "newgrounds.com" },
    { name: "Nico Seiga", domain: "nicovideo.jp" },
    { name: "Nico Seiga", domain: "nico.ms" },
    { name: "Nijie", domain: "nijie.info" },
    { name: "Plurk", domain: "plurk.com" },
    { name: "Reddit", domain: "reddit.com" },
    { name: "Reddit", domain: "redd.it" },
    { name: "Skeb", domain: "skeb.jp" },
    { name: "Tinami", domain: "tinami.com" },
    { name: "Tinami", domain: "tinami.jp" },
    { name: "Tumblr", domain: "tumblr.com" },
    { name: "Weibo", domain: "weibo.com" },
    { name: "Weibo", domain: "sinaimg.cn" },
    { name: "Ask.fm", domain: "ask.fm" },
    { name: "BCY", domain: "bcy.net" },
    { name: "Circle.ms", domain: "circle.ms" },
    { name: "DLSite", domain: "dlsite.com" },
    { name: "DLSite", domain: "dlsite.net" },
    { name: "DLSite", domain: "dlsite.jp" },
    { name: "Doujinshi.org", domain: "doujinshi.org" },
    { name: "Doujinshi.org", host: "doujinshi.mugimugi.org" },
    { name: "Facebook", domain: "facebook.com" },
    { name: "Facebook", domain: "fbcdn.net" },
    { name: "Livedoor", domain: "livedoor.jp" },
    { name: "Livedoor", host: "livedoor.blogimg.jp" },
    { name: "Livedoor", domain: "blog.jp" },
    { name: "Livedoor", domain: "diary.to" },
    { name: "Livedoor", domain: "doorblog.jp" },
    { name: "Livedoor", domain: "dreamlog.jp" },
    { name: "Livedoor", domain: "gger.jp" },
    { name: "Livedoor", domain: "ldblog.jp" },
    { name: "Livedoor", domain: "livedoor.biz" },
    { name: "Livedoor", domain: "officialblog.jp" },
    { name: "Livedoor", domain: "publog.jp" },
    { name: "Livedoor", domain: "weblog.to" },
    { name: "Livedoor", domain: "xxxblog.jp" },
    { name: "Ko-fi", domain: "ko-fi.com" },
    { name: "Mixi.jp", domain: "mixi.jp" },
    { name: "Patreon", domain: "patreon.com" },
    { name: "Piapro.jp", domain: "piapro.jp" },
    { name: "Sakura.ne.jp", domain: "sakura.ne.jp" },
    { name: "Youtube", domain: "youtu.be" },
    { name: "Youtube", domain: "youtube.com" },
]

Also, now there is no regexp and subdomain, and I added youtube.com case.

hdk5 commented 1 year ago

I removed rules that don't match any of the artist urls, but left those that are not in SITE_ORDER - I want to also add site icons later (https://github.com/danbooru/danbooru/blob/f955718672a31e9c19afc5381eceeb0c2e7653d6/app/helpers/icon_helper.rb#L8).

youtube.com rule is not needed, as it falls under fallback rule