WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
729 stars 149 forks source link

WikiApiary wiki lists #116

Open plexish opened 10 years ago

plexish commented 10 years ago

Most are probably on various other lists you guys have, but I'm sure there's some worth grabbing. I've set up some queries to export all the wikis not marked as archived by you on WikiApiary (note that due to a bug not all the actually archived wikis are marked as archived).

Standalone (4548 wikis, 87 defunct) Farm (840, 14 defunct)

I'm not sure what you would want to do with them, maybe add to the list of wikis collection?

emijrp commented 10 years ago

Hello etesp, thanks for this. We need to get the API urls from your files, compare with our lists and exclude what we have. With a bit of grep, sort, uniq and diff, I think anyone can produce a de-duplicated list.

plexish commented 10 years ago

I can give you lists of just API URLs is that makes things easier? Also, we have a field for alternate API URL which is used mostly to deal with cases where you guys have a different API URL from us for the same wiki. Would you like those too (separate list, same list?)?

emijrp commented 10 years ago

@nemobis Are you using these lists? Do you know how to trim all but the API url?

Perhaps we can start using GitHub GIST to work/share wiki lists instead of a directory inside repo.

nemobis commented 10 years ago

Emilio J. Rodríguez-Posada, 27/06/2014 12:55:

@nemobis https://github.com/nemobis Are you using these lists? Do you know how to trim all but the API url?

I use all the lists I made, both index.php and api.php. I'm not sure what's the question; as I said, my most current list of wikis to work on is batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt (2500 wikis) and I think WikiApiary currently has less than 500 wikis we don't know about. Now that we're fixing some bugs, I'll rerun over that list to see how many of those work.

Perhaps we can start using GitHub GIST to work/share wiki lists instead of a directory inside repo.

I don't see how that would be an improvement. I often use version control features for our lists.

nemobis commented 10 years ago

This issue can be made rather trivial once https://github.com/WikiApiary/WikiApiary/issues/172 https://github.com/WikiApiary/WikiApiary/issues/130 are fixed, or needs some manual cleanup. Should I commit to the taskforce directory my uploader.py logs, i.e. their lines confirming a dump is "verified" uploaded?

nemobis commented 4 years ago

Asked status at https://lists.wikimedia.org/pipermail/wikiapiary/2020-February/000039.html