Regular users: getPageInfo(String[]) splits the page list in chunks of 50 titles. Works fine.
High-limit users: this method processes batches of 500 titles. Throws a StringIndexOutOfBoundsException while parsing the next <page> element.
Due to a bug in MW API, the 50-results limit was being applied on bot accounts while regular users could query 500 results at once. It should be the other way around. With said testactions parameter, a bot could not query more than 7 titles (8 actions * 7 titles > 50) in one batch, hence a continuation parameter is generated and any result beyond the 7th is returned by the API as <page ... /> instead of a full <page>...props...</page> element. Incidentally, Wiki.java is not prepared for continuation queries in this case nor for missing </page> tags. An exception is generated due to the latter at line 1638.
The bug was solved in patch 460886 and will be deployed in production WMF wikis soon.
New behavior (once MW master branch hits production):
Regular users: getPageInfo(String[]) will throw an exception if more than 7 pages are passed on to this method. See reasons above.
Bot users: an exception will be thrown for input arrays of 76+ page titles (8 * 75 = 500).
Proposed solutions:
Handle continuation parameters in getPageInfo(String[]).
Remove testactions from getPageInfo(String[]), but keep it for single page queries (getPageInfo(String)).
Factor out testactions into a separate method that will use makeListQuery.
IMO testactions is not suitable for vectorized queries. Being able to query no more than 7 titles at once instead of 50 is a severe drawback wrt the previous implementation of this method.
Per
Wiki.getPageInfo(String[] pages)
:https://github.com/MER-C/wiki-java/blob/fccfba522432949d45a4abe169042c9c815a4d76/src/org/wikipedia/Wiki.java#L1617-L1620
The
testactions
parameter is a recent addition (https://github.com/MER-C/wiki-java/commit/ebc1c82949260afe9480c735347e48d56e4d8c59). It turns out to be regarded as an expensive query by MW API since the total number of tested actions counts towards the API slow limit (i.e. 50 results for regular users, 500 for bots).Current behavior:
getPageInfo(String[])
splits the page list in chunks of 50 titles. Works fine.StringIndexOutOfBoundsException
while parsing the next<page>
element.Due to a bug in MW API, the 50-results limit was being applied on bot accounts while regular users could query 500 results at once. It should be the other way around. With said
testactions
parameter, a bot could not query more than 7 titles (8 actions * 7 titles > 50) in one batch, hence a continuation parameter is generated and any result beyond the 7th is returned by the API as<page ... />
instead of a full<page>...props...</page>
element. Incidentally, Wiki.java is not prepared for continuation queries in this case nor for missing</page>
tags. An exception is generated due to the latter at line 1638.The bug was solved in patch 460886 and will be deployed in production WMF wikis soon.
New behavior (once MW
master
branch hits production):getPageInfo(String[])
will throw an exception if more than 7 pages are passed on to this method. See reasons above.Proposed solutions:
getPageInfo(String[])
.testactions
fromgetPageInfo(String[])
, but keep it for single page queries (getPageInfo(String)
).testactions
into a separate method that will usemakeListQuery
.IMO
testactions
is not suitable for vectorized queries. Being able to query no more than 7 titles at once instead of 50 is a severe drawback wrt the previous implementation of this method.