clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Skip collapsing/de-duplication for statistics and sitemap generation #207

Closed twagoo closed 5 years ago

twagoo commented 5 years ago

The VLO statistics generator generates numbers based on queries assuming no result collapsing. Using the default response handler, this assumption no longer holds (see #113). Probably the best way to fix this (and maybe even gain some performance increase) is by using the fast handler instead.

Same for retrieving all document IDs for the sitemap generation.

twagoo commented 5 years ago

Illustration of the problem (actual number of records after import is ~894k):

screenshot 2018-12-17 at 14 09 33