99moorem / wikiteam

Automatically exported from code.google.com/p/wikiteam
0 stars 0 forks source link

dumpgenerator.py stuck with files requiring authentication #89

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Example: the http://qed.princeton.edu wiki. The script reaches the first file 
requiring authentication, asks for user and password and entering bogus 
credentials doesn't help because it keeps doing so. I've also no idea how to 
deal with such a case because apparently the webserver returns HTTP 302, not 
HTTP 401. Needs some debugging/research on how to detect these cases and skip 
the files in question.

$ wget -S http://qed.princeton.edu/images/1/10/JR_HMG050409.tif
--2014-01-31 16:03:26--  http://qed.princeton.edu/images/1/10/JR_HMG050409.tif
Risoluzione di qed.princeton.edu (qed.princeton.edu)... 128.112.131.223
Connessione a qed.princeton.edu (qed.princeton.edu)|128.112.131.223|:80... 
connesso.
Richiesta HTTP inviata, in attesa di risposta... 
  HTTP/1.1 302 Found
  Date: Fri, 31 Jan 2014 15:03:26 GMT
  Server: Apache/2.2.3 (Oracle)
  Location: http://qed.princeton.edu/getfile.php?f=JR_HMG050409.tif
  Content-Length: 321
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: text/html; charset=iso-8859-1
Posizione: http://qed.princeton.edu/getfile.php?f=JR_HMG050409.tif [segue]
--2014-01-31 16:03:26--  http://qed.princeton.edu/getfile.php?f=JR_HMG050409.tif
Riutilizzo della connessione esistente a qed.princeton.edu:80.
Richiesta HTTP inviata, in attesa di risposta...
  HTTP/1.1 302 Found
  Date: Fri, 31 Jan 2014 15:03:26 GMT
  Server: Apache/2.2.3 (Oracle)
  X-Powered-By: PHP/5.1.6
  Location: /index.php/QED:Restricted_File
  Content-Length: 0
  Keep-Alive: timeout=15, max=99
  Connection: Keep-Alive
  Content-Type: text/html; charset=ISO-8859-1
Posizione: /index.php/QED:Restricted_File [segue]
--2014-01-31 16:03:27--  http://qed.princeton.edu/index.php/QED:Restricted_File
Riutilizzo della connessione esistente a qed.princeton.edu:80.
Richiesta HTTP inviata, in attesa di risposta...
  HTTP/1.1 200 OK
  Date: Fri, 31 Jan 2014 15:03:27 GMT
  Server: Apache/2.2.3 (Oracle)
  X-Powered-By: PHP/5.1.6
  Content-language: en
  Vary: Accept-Encoding,Cookie
  X-Vary-Options: Cookie;string-contains=TigerWeb_tw_UserID;string-contains=TigerWeb_tw__session,Accept-Encoding;list-contains=gzip
  Expires: Thu, 01 Jan 1970 00:00:00 GMT
  Cache-Control: private, must-revalidate, max-age=0
  Last-modified: Wed, 27 Aug 2008 18:14:18 GMT
  Keep-Alive: timeout=15, max=98
  Connection: Keep-Alive
  Transfer-Encoding: chunked
  Content-Type: text/html; charset=utf-8
Lunghezza: non specificato [text/html]
Salvataggio in: "JR_HMG050409.tif"
$ file JR_HMG050409.tif
JR_HMG050409.tif: HTML document, ASCII text, with very long lines

Original issue reported on code.google.com by nemow...@gmail.com on 31 Jan 2014 at 3:07

GoogleCodeExporter commented 8 years ago

Original comment by nemow...@gmail.com on 31 Jan 2014 at 3:08

GoogleCodeExporter commented 8 years ago
That is:

Checking index.php... http://qed.princeton.edu/index.php
index.php is OK
Analysing http://qed.princeton.edu/index.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
2224 images were found in the directory from a previous session
Retrieving images from "MG©Colonial Distribution of the World 1914.jpg"
Enter username for access to restricted images (Princeton University netid) at 
qed.princeton.edu:
Enter password for  in access to restricted images (Princeton University netid) 
at qed.princeton.edu:
Enter username for access to restricted images (Princeton University netid) at 
qed.princeton.edu:
Enter password for  in access to restricted images (Princeton University netid) 
at qed.princeton.edu:
Enter username for access to restricted images (Princeton University netid) at 
qed.princeton.edu:
Enter password for  in access to restricted images (Princeton University netid) 
at qed.princeton.edu:

...

Original comment by nemow...@gmail.com on 2 Feb 2014 at 10:48