NaturalHistoryMuseum / ckanext-versioned-datastore

A CKAN extension providing a versioned datastore using MongoDB and Elasticsearch.
GNU General Public License v3.0
9 stars 3 forks source link

datastore_get_resource_versions action is slow #117

Closed jrdh closed 1 week ago

jrdh commented 1 year ago

It's so slow it's timing out and causing this sheet we use to not update: https://docs.google.com/spreadsheets/d/1n_LopSOKN3LJNdyTfrdDYp652QLl6hg-WDx9jRkQvok/edit?usp=sharing.

The endpoint being called is: https://data.nhm.ac.uk/api/3/action/datastore_get_resource_versions?resource_id=05ff2255-c38a-40c9-b657-4ccb55ab2feb. Pretty sure this uses some Splitgill functionality too so this may need changes there as well to make it work more efficiently. Needs some investigation.

jrdh commented 1 week ago

Sped up by removing the counts so now this action just returns the available versions, e.g.: https://data-nlb-stg-01.nhm.ac.uk/api/3/action/vds_version_resource?resource_id=05ff2255-c38a-40c9-b657-4ccb55ab2feb. The idea behind this is that as the number of versions increases on a resource, this call is going to just get slower and slower. We could maybe looking at paging the aggregation but the other way of dealing with it is to just force the user to first get a list of resource versions and then make a call per version to get the count. Could be changed in the future but I think the only user of this action at present is a script running on a google sheet which shows digitisation progress so the only user impacted is us.