Data extracted from Wikivoyage, the free travel guide at http://wikivoyage.org. Leverage Wikivoyage listings on your smartphone, or in your own mashups.
This PR creates a new BulkValidator interface and its first implementation - WikidataBulkValidator which replaces WikidataValidator. The new WikidataBulkValidator validates QIDs in batches of 200 using the SPARQL service which checks that the QID actually exists and that it's not a redirect.
The PR also replaces the uses of WikidataValidator with the WikidataBulkValidator in Main, ValidationReport and in ValidationTests.
I did some experiments with the SPARQL service and found that it can process around 400 QID in one request, and beyond that the server returns 413 - Request entity too large. Unfortunately this is not documented, and could change, so I suggest we use a conservative limit of 200, which seems like plenty to me.
The most interesting part was deciding on the design of the BulkValidator interface. The code that uses validators is designed to validate Listings one at a time, and process the results immediately which is not easily fitted to the concept of bulk validation.
Let me know if this solution is appropriate, and we can discuss and tweak it if necessary.
This PR creates a new BulkValidator interface and its first implementation - WikidataBulkValidator which replaces WikidataValidator. The new WikidataBulkValidator validates QIDs in batches of 200 using the SPARQL service which checks that the QID actually exists and that it's not a redirect.
The PR also replaces the uses of WikidataValidator with the WikidataBulkValidator in Main, ValidationReport and in ValidationTests.
I did some experiments with the SPARQL service and found that it can process around 400 QID in one request, and beyond that the server returns 413 - Request entity too large. Unfortunately this is not documented, and could change, so I suggest we use a conservative limit of 200, which seems like plenty to me.
The most interesting part was deciding on the design of the BulkValidator interface. The code that uses validators is designed to validate Listings one at a time, and process the results immediately which is not easily fitted to the concept of bulk validation.
Let me know if this solution is appropriate, and we can discuss and tweak it if necessary.