Public-Health-Scotland / phsopendata

Functions to extract and interact with data from the Scottish Health and Social Care Open Data platform.
https://public-health-scotland.github.io/phsopendata/
9 stars 3 forks source link

add deeper query options and check_connectivity function #7

Closed daikman closed 2 years ago

daikman commented 2 years ago

I've added the ability to filter rows and columns using the 'q' and 'fields' parameters of the datastore_search endpoint, respectively.

I also added a function that checks the connection to CKAN in order to give better error messages from get_resource and get_dataset. And for get_resource I added some extra error handling.

Made small changes to documentation of functions too.

csillasch commented 2 years ago

I don`t think we should set the query limit to 100 if there is a filtering query with a larger response. As a user I would find it tedious to get a warning message instructing me to repeat the query specifying the number of rows, rather than getting that response in the first place - less clunky and frustrating from user perspective. Did you have specific reasoning to include the limit?

csillasch commented 2 years ago

Also looked at the failed checks - think these will be resolved by updating the tests to expect the new error messages, but if not may need a bit more attention

daikman commented 2 years ago

I don`t think we should set the query limit to 100 if there is a filtering query with a larger response. As a user I would find it tedious to get a warning message instructing me to repeat the query specifying the number of rows, rather than getting that response in the first place - less clunky and frustrating from user perspective. Did you have specific reasoning to include the limit?

If there is no limit set in a datastore_search query, then it will return 100 results by default. I thought it would be good to make that explicit. But you are right that it would be annoying for users!

Instead, I've changed it so that if they don't set a limit themselves, it becomes 99999. If a query matches more rows than that, the warning tells them to download the whole resource and apply filters afterwards. If their query matches 99999 rows or less, then the user won't notice that this limit has been set.

Also looked at the failed checks - think these will be resolved by updating the tests to expect the new error messages, but if not may need a bit more attention

This is a good point about tests! Even if this isn't the reason for failed checks I should update them anyway.

csillasch commented 2 years ago

If there is no limit set in a datastore_search query, then it will return 100 results by default. I thought it would be good to make that explicit. But you are right that it would be annoying for users!

Instead, I've changed it so that if they don't set a limit themselves, it becomes 99999. If a query matches more rows than that, the warning tells them to download the whole resource and apply filters afterwards. If their query matches 99999 rows or less, then the user won't notice that this limit has been set.

Good point - that sounds like a sensible solution. I think previously we used a single-row call first to determine the exact row number to return.

daikman commented 2 years ago

I've made three major changes to this pull request:

Refactoring the way the package makes API calls in order to better handle errors and reduce unnecessary API calls. Using {cli} to throw and format error messages. I also separated all functions into their own files and added tests for each.

Forgot to add any of your suggested changes @Moohan before I made these changes! But thanks for your comments they were very helpful

csillasch commented 2 years ago

Would it be helpful to have a message also when dump is being used for row>99999 requests but there is no query/filtering? Just to avoid any confusion when the returned number of rows doesn`t match the rows requested.

daikman commented 2 years ago

Would it be helpful to have a message also when dump is being used for row>99999 requests but there is no query/filtering? Just to avoid any confusion when the returned number of rows doesn`t match the rows requested.

Good point! I've added another if statement to use_dump_check() that will give this warning