UUDigitalHumanitieslab / EDPOP

A virtual research environment (VRE) that lets you collect, align and annotate bibliographical and biographical records from several online catalogs.
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Use a SPARQL update to remove old records #173

Closed jgonggrijp closed 3 months ago

jgonggrijp commented 3 months ago

This fixes #166. The update query exploits the fact that records currently have only one level of nested resources. However, when (if) the number of nesting levels increases, it should still be possible to adjust the query to work.

Despite autocommit being off, the old code was slow because it retrieved nested resources from each record separately. This required a separate request for each record and each field.

The new query should work; I tested it in the Blazegraph web UI before implementing. That being said, it might be possible to test it more thoroughly.

I gratefully used the kb-boek example query from #166 to assess the performance impact. I obtained the following numbers on my own laptop:

Old code, first retrieval: 3.7 seconds. Old code, subsequent retrievals: 16-17 seconds. New code: 0.3-0.5 seconds.

Requesting review from @lukavdplas, but CC @tijmenbaarda FYI.