Mattschillinger / wikiteam

Automatically exported from code.google.com/p/wikiteam
0 stars 0 forks source link

get file list needs to use API #22

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
getImageFilenamesURL exclusively uses index.php, when api.php would be a better 
way to get the file list.

See also:
* Issue 21
* Issue 19

Original issue reported on code.google.com by griffin....@gmail.com on 9 Jul 2011 at 11:50

GoogleCodeExporter commented 8 years ago
Yep. Are you interested on patching this?

Original comment by emi...@gmail.com on 12 Jul 2011 at 4:06

GoogleCodeExporter commented 8 years ago
By the way, this issue is the same #issue1

Original comment by emi...@gmail.com on 12 Jul 2011 at 4:09

GoogleCodeExporter commented 8 years ago
Ah, oops -- didn't notice Issue 1. I am still learning Python (and scripting), 
so if I work on it, it may take a while to figure out. I would be glad to try, 
just know that I am not very fast.

Original comment by griffin....@gmail.com on 12 Jul 2011 at 4:19

GoogleCodeExporter commented 8 years ago
Screenscraping image lists seems particularly difficult: sometimes I had to 
kill the script because it was using all my CPU and looked freezed, but it only 
needed some hours more (!), for instance 
http://p.defau.lt/?JteF_scEUmTWRk2uwGU55g
In other cases, the list can't be generated at all, e.g. 
http://www.wikinfo.org/ (probably because Special:ListFiles looks disabled) and 
the tool gets stuck in "Retrieving image filenames" forever.

I imagine that issue 21 would be resolved as well.

Original comment by nemow...@gmail.com on 15 Jul 2011 at 7:05

GoogleCodeExporter commented 8 years ago
Moreover, sometimes Special:ListFiles works only up to a certain amount of 
memory, otherwise produces memory errors. See the following example, a wiki 
where the special page works with 50 or 100 files but not 250 or more (and the 
script probably requests 500).

Analysing http://tmbw.net/wiki/api.php
Loading config file...
Resuming previous dump process...
Image list is incomplete. Reloading...
Retrieving image filenames
<br />
Fatal error:  Allowed memory size of 83886080 bytes exhausted (tried to 
allocate 27000000 bytes) in 
/home/tmbwnet/public_html/wiki/includes/media/Bitmap.php on line 362<br />

This wiki doesn't use marks to split contain

Original comment by nemow...@gmail.com on 28 Aug 2011 at 10:43

GoogleCodeExporter commented 8 years ago
Another example: http://wikicafe.metacafe.com/en/Main_Page
Script fails in unexpected way (the output doesn't say much) and some time is 
needed to reproduce: probably gets killed for memory usage too high because I 
saw it increasing up to at least 700 MiB even before the first batch of 500 
file names.

Original comment by nemow...@gmail.com on 13 Dec 2011 at 9:34

GoogleCodeExporter commented 8 years ago
Issue 1 has been merged into this issue.

Original comment by nemow...@gmail.com on 29 Feb 2012 at 11:01

GoogleCodeExporter commented 8 years ago
Issue 15 has been merged into this issue.

Original comment by nemow...@gmail.com on 29 Feb 2012 at 11:24

GoogleCodeExporter commented 8 years ago

Original comment by nemow...@gmail.com on 29 Feb 2012 at 11:41

GoogleCodeExporter commented 8 years ago

Original comment by ad...@alphacorp.tk on 22 Jun 2012 at 10:02

GoogleCodeExporter commented 8 years ago
Well, I did this in r668 some time ago and so far nobody has complained.
wikinfo was downloaded successfully 
https://archive.org/details/wiki-wikinfoorg_w and without problems (though I 
didn't check resources consumption closely); if there are issues they can be 
filed separately.

Original comment by nemow...@gmail.com on 7 Nov 2013 at 10:53

GoogleCodeExporter commented 8 years ago
Reopen; I tried on TMBW.net and I remembered what's broken; it made a list of 
the first 500 images only.

Original comment by nemow...@gmail.com on 8 Nov 2013 at 6:54

GoogleCodeExporter commented 8 years ago
The wiki is on 1.20 and the format of the API changed (docs were wrong); fixed 
in r862 (tested). There are some separate reports for specific sites, let's 
continue there.

Original comment by nemow...@gmail.com on 8 Nov 2013 at 10:09

GoogleCodeExporter commented 8 years ago
Issue 15 has been merged into this issue.

Original comment by nemow...@gmail.com on 31 Jan 2014 at 12:43