Mattschillinger / wikiteam

Automatically exported from code.google.com/p/wikiteam
0 stars 0 forks source link

Don't try to download private wikis unless --cookies is given #71

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
While downloading wikkii wikis, it stopped on this one; turns out it's just 
because it has login required.

Checking index.php... http://aquapedia.wikkii.com/w/index.php
index.php is OK
Analysing http://aquapedia.wikkii.com/w/index.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML is corrupt? Regenerating...
Retrieving the XML for every page from "start"
    XML for "Main_Page" is wrong. Waiting 20 seconds and reloading...

Original issue reported on code.google.com by nemow...@gmail.com on 6 Nov 2013 at 8:14

GoogleCodeExporter commented 8 years ago
I committed what I'm using for now: r861 (checks only for English warning).
A proper fix would, I guess, checking the API for usergroup rights 
https://office.wikimedia.org/w/api.php?action=query&meta=siteinfo&siprop=usergro
ups ; if we can <error code="readapidenied" info="You need read permission to 
use this module" xml:space="preserve"> even for that we know we can't do more. 
:)
However wikkii has API disabled and older wikis don't have API at all.

Original comment by nemow...@gmail.com on 7 Nov 2013 at 11:02

GoogleCodeExporter commented 8 years ago

Original comment by ad...@alphacorp.tk on 11 Nov 2013 at 6:14

GoogleCodeExporter commented 8 years ago

Original comment by ad...@alphacorp.tk on 12 Nov 2013 at 6:38

GoogleCodeExporter commented 8 years ago
I changed the regular expression check to search for instances of 
"Special:Badtitle" while in Special:Version, seems to work with non-English 
languages, though I am not entirely sure if it works with older versions of 
MediaWiki.

What is left is to implement a check for api.php, though in DumpGenerator 2.0 
it has already been implemented.

Original comment by ad...@alphacorp.tk on 15 Nov 2013 at 7:46

GoogleCodeExporter commented 8 years ago
No idea what to do in cases such as this (also no idea what release it is):

<h1 class="firstHeading">Login Required</h1>
        <div id="bodyContent">
            <h3 id="siteSub">From Railroad Depot</h3>
            <div id="contentSub"></div>
                                    <div id="jump-to-nav">Jump to: <a href="#column-one">navigation</a>, <a href="#searchInput">search</a></div>            <!-- start content -->
            <p>You must <a href="/content/index.php?title=Special:Userlogin&returnto=Main_Page" title="Special:Userlogin">log in</a> to view other pages.
</p>
<!--Main_Page--><div class="printfooter">

Maybe a ugly multiline regex to match from <!-- start content --> to <div 
class="printfooter"> with a single line in between containing a link with 
returnto...

Original comment by nemow...@gmail.com on 2 Feb 2014 at 11:05

GoogleCodeExporter commented 8 years ago
That's a MediaWiki 1.9.3 wiki.

Original comment by nemow...@gmail.com on 2 Feb 2014 at 11:08