WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
705 stars 147 forks source link

Is there a way to create a MediaWiki XML dump from HTML pages on web.archive.org? #482

Closed trenkert closed 5 days ago

trenkert commented 6 days ago

There are wikis preserved on archive.org which are now longer accessible on their original servers. Is there any way to download those wikis (mainly mediawiki installations) in full to import them into a fresh mediawiki installation and run them again locally?

nemobis commented 5 days ago

Il 29/06/24 21:03, Thomas Renkert ha scritto:

There are wikis preserved on archive.org which are now longer accessible on their original servers.

Yes, thousands of them.

Is there any way to download those wikis (mainly mediawiki installations) in full to import them into a fresh mediawiki installation and run them again locally?

Yes, just click the relevant download button on the sidebar or click "show all" and then copy the download URL for use with your preferred download manager (like wget).

Then see https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps

trenkert commented 5 days ago

thank you, I did not mean archive.org as in archived wiki xmls dumps, but the waybackmachine with the captured pages of a wiki. The xml dump does not exist on archive.org, but the waybackmachine has the pages captured. Is it possible to reconstruct an xml dump from pages captured on wayback?

nemobis commented 5 days ago

Not really. You'll need an HTML crawler customised for MediaWiki purposes and then a script to convert the HTML back to wikitext. There are some such partial solutions in https://www.mediawiki.org/wiki/Category:Import/Export . History can't be realistically produced.

If the wiki is less than a thousand pages big, it's probably easier to copy and paste pages one by one with the VisualEditor.

trenkert commented 5 days ago

thanks!