In general, the whole pipeline runs faster the more resources we process in each page.
The bigger the page though, the more likely we'll hit a query timeout.
At the moment we need to set the :ook.etl/select-page-size at the lowest value that works for all resource pages. The number of solutions involved varies both by resource type and page (i.e. some observations are bigger than others).
It would be nice to have the page size adapt to the context over the course of the pipeline run:
If the requests succeed then the size could be increased
If the requests fail (503 or Unexpected EOF) then the size should be decreased (and the failing request repeated)
In general, the whole pipeline runs faster the more resources we process in each page.
The bigger the page though, the more likely we'll hit a query timeout.
At the moment we need to set the
:ook.etl/select-page-size
at the lowest value that works for all resource pages. The number of solutions involved varies both by resource type and page (i.e. some observations are bigger than others).It would be nice to have the page size adapt to the context over the course of the pipeline run: