The-Academic-Observatory / observatory-platform

Observatory Platform Package
https://docs.observatory.academy
Apache License 2.0
18 stars 5 forks source link

Telescope implementation of minimal api call scheduling in WoS and SCOPUS #277

Closed tuanchien closed 3 years ago

tuanchien commented 4 years ago

Web API based data sources normally have quota limits on the number of API calls you can make in a given period.

For SCOPUS this means:

For WoS:

The optimisation problem for SCOPUS is well defined and observatory_platform.utils.ScheduleOptimiser solves for those constraints.

On the face of it, one solution to the WoS optimisation problem is to make a single API call to fetch the entire record. A test fetch indicates that you can pull a reasonable amount of data in a single query. It allowed me to pull almost 4 years worth (2017-1-1 to 2020-10-1) of Curtin Records for example (between 14,400 - 14,500 records). A single query would not be optimal if there are download interruptions and a lot of records, since it would necessitate a re-query. So perhaps it makes sense to solve the same minimisation problem from SCOPUS for WoS as well. However if the preferred strategy is to just do bulk downloads for larger periods, e.g., a few years at a time, this diminishes the usefulness of minimising API calls, since as a proportion of total calls, the larger the period ranges, the smaller the possible benefit of minimising API calls.

Timing of dag runs Perhaps it makes sense to pull only full months worth of records to make optimisation more consistent. So for example if a dag run was on 15 Oct, it would treat 31 Sept as end date for the record pull (at least in the SCOPUS situation).

What needs doing For each telescope,

Blocking issue Database component is missing. From the discussions it sounds like it is earmarked for development in the near future by @jdddog.

tuanchien commented 4 years ago

@rhosking, @jdddog, @aroelo