MER-C / wiki-java

A MediaWiki bot framework in Java
GNU Affero General Public License v3.0
66 stars 58 forks source link

Query limit inconsistency and required continuation in Wiki.getPageInfo #161

Closed PeterBowman closed 6 years ago

PeterBowman commented 6 years ago

Per Wiki.getPageInfo(String[] pages):

https://github.com/MER-C/wiki-java/blob/fccfba522432949d45a4abe169042c9c815a4d76/src/org/wikipedia/Wiki.java#L1617-L1620

The testactions parameter is a recent addition (https://github.com/MER-C/wiki-java/commit/ebc1c82949260afe9480c735347e48d56e4d8c59). It turns out to be regarded as an expensive query by MW API since the total number of tested actions counts towards the API slow limit (i.e. 50 results for regular users, 500 for bots).

Current behavior:

Due to a bug in MW API, the 50-results limit was being applied on bot accounts while regular users could query 500 results at once. It should be the other way around. With said testactions parameter, a bot could not query more than 7 titles (8 actions * 7 titles > 50) in one batch, hence a continuation parameter is generated and any result beyond the 7th is returned by the API as <page ... /> instead of a full <page>...props...</page> element. Incidentally, Wiki.java is not prepared for continuation queries in this case nor for missing </page> tags. An exception is generated due to the latter at line 1638.

The bug was solved in patch 460886 and will be deployed in production WMF wikis soon.

New behavior (once MW master branch hits production):

Proposed solutions:

  1. Handle continuation parameters in getPageInfo(String[]).
  2. Remove testactions from getPageInfo(String[]), but keep it for single page queries (getPageInfo(String)).
  3. Factor out testactions into a separate method that will use makeListQuery.

IMO testactions is not suitable for vectorized queries. Being able to query no more than 7 titles at once instead of 50 is a severe drawback wrt the previous implementation of this method.