Public-Health-Scotland / phsopendata

Functions to extract and interact with data from the Scottish Health and Social Care Open Data platform.
https://public-health-scotland.github.io/phsopendata/
9 stars 3 forks source link

Batch downloads #6

Closed daikman closed 7 months ago

daikman commented 3 years ago

I've added an "offset" argument to get_resource() to enable batch downloading. I also created get_resource_batched() to wrap around get_resource() to easily download a resource in batches.

I haven't written tests for the new function yet, but get_resource() still passes its tests despite the changes made to it.

csillasch commented 3 years ago

The new batched wrapper function around get-resource looks great! Couple of initial thoughts:

Moohan commented 3 years ago

Agree with Csilla's comments.

I would suggest calling the offset parameter skip, skip_n, skip_rows or something like that as I think that is more intuitive. The offset name makes most sense only in the context of the batch function.

I wonder if you could run some tests for speed/number of timeouts to determine what is a good default size. This could then also be used 'baked-in' to the get_resource function so for example when requesting > 10, 000 rows it reverts to the batch function with n_rows = 2, 000.

csillasch commented 7 months ago

Closing this as using dump endpoint over rows >99999 achieves sufficient efficiency (can revisit in future if needed).