Open GoogleCodeExporter opened 9 years ago
> If de.tudarmstadt.ukp.wikipedia.api.Wikipedia.getPages(PageQuery) returned an
> unmodifiable collection instead of an iterable, it would be possible to get
the
> number of pages using size(). That would be helpful e.g. to display progress
> information (x of y pages processed). Inerhiting from AbstractCollection may
be
> helpful.
getPages(PageQuery) is _very_ slow. In the current form, it was never intended
for productive usage.
Are you going to use that?
If not, I am not sure, whether it's worth the effort.
Original comment by torsten....@gmail.com
on 21 Sep 2010 at 4:10
The ExtendedWikipediaReader is using that to iterate over the pages. Since this
process takes quite long, it is conventient to know what the total number of
pages will be, so a progress meter can show in percent how much has already
been processed and how long it will approximately still take to complete. The
progress currently displayed there used the number of pages from the wiki
metadata, which does not seem to be the correct place to look for. The metadata
told me there were like 900k but the query just returned about 500k.
I can change it in the Wikipedia API if there is no objection.
Original comment by torsten....@gmail.com
on 21 Sep 2010 at 4:11
Is this bug still open?
Original comment by oliver.ferschke
on 1 Jun 2011 at 6:06
I think I was the person who originally requested that feature. Well, I suppose
if getPages() still returns an iterable, the issue is still open.
Original comment by richard.eckart
on 2 Jun 2011 at 9:29
Original issue reported on code.google.com by
torsten....@gmail.com
on 21 Sep 2010 at 4:10