ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
352 stars 72 forks source link

Remove www. from domain names (at least in viewer) #532

Open willsheppard opened 2 years ago

willsheppard commented 2 years ago

When browsing for an archived website here: https://archive.fart.website/archivebot/viewer/domains/ -- I didn't expect "fanfiction.net" to be under "www.fanfiction.net".

www is the equivalent of "the" in a title -- it shouldn't be used in an index as it's not really part of the website's canonical name. Is it possible this might be implemented in the viewer only, with no other code changes required? Or whichever solution is best.

TheTechRobo commented 2 years ago

Can't you use the search box, though?

TheTechRobo commented 2 years ago

Nevermind, saw the context in #archiveteam-bs.

The problem with this is that www. is sometimes different from "normal". It's not common, but I've seen it before, although I can't remember what website it was (some type of forum). Non-www was a home page with a link to both the English one, and the www one (I don't remember what language the www one was, but it wasn't English). Something like that, at least.

But, if you are just suggesting that www.fanfiction.net should also be under F in addition to W, I think that would be OK. Maybe a maintainer can say whether this is doable. :-)

systwi-again commented 1 year ago

The problem with this is that www. is sometimes different from "normal". It's not common, but I've seen it before, although I can't remember what website it was (some type of forum).

Working protocol/subdomain difference examples (as of 2022-12-10 20:38:17 UTC):

I know there are other instances of this—some returning vastly different pages or redirecting to other domains—but no further examples come to mind at the moment. I will update this message when I come across more (if I remember to do so, that is, haha).

To add further insult to injury, I have come across instances of prefixes such as www2., www3., www49., etc. Should these appear with the results as well? I assume, however, it may be rather unlikely one is looking for these variations without specifying them beforehand.

…if you are just suggesting that www.fanfiction.net should also be under F in addition to W, I think that would be OK.

I, too, agree with this implementation.