Closed Moohan closed 1 day ago
@Moohan Good idea to add col_select
!
And thanks for spotting this problem with the column data types. In this case it is actually a problem with the data rather than the function, as they should be the same type. However, just to be safe I've now added a fix that coerces columns with inconsistent data types to character, and warns the user about it.
There's now a branch for this issue
Thanks, nice work with the column fix - you could probably use that to loop through all the datasets and spit out a list of datasets where certain resources have variable types which could possibly be fixed!
I couldn't resist doing that so I ran some code and pulled a list of the datasets with column mismatch issues:
library(phsopendata)
content <- phsopendata:::phs_GET("package_list", "")
dataset_names <- unlist(content$result)
check_data_set <- purrr::possibly(get_dataset, NA)
check <- purrr::map(dataset_names, ~check_data_set(.x, rows = 1)) %>%
purrr::set_names(dataset_names)
names(check[is.na(check)])
Ones with issues:
[1] "annual-cancer-incidence"
[2] "births-in-scottish-hospitals"
[3] "cancer-mortality"
[4] "care-home-census"
[5] "covid-19-positive-cases-in-pregnancy-in-scotland"
[6] "dental-practices-and-patient-registrations"
[7] "geography-codes-and-labels"
[8] "gp-practice-contact-details-and-list-sizes"
[9] "learning-disability-inpatient-activity"
[10] "long-acting-reversible-methods-of-contraception-larc-in-scotland"
[11] "mental-health-inpatient-activity"
[12] "nhsscotland-payments-to-general-practice"
[13] "primary-1-body-mass-index-bmi-statistics"
[14] "scottish-suicide-information-database-contact-with-unscheduled-care-services-prior-to-death"
[15] "teenage-pregnancy"
[16] "termination-of-pregnancy-in-scotland"
[17] "weekly-covid-19-statistical-data-in-scotland"
There's now a branch for this issue
@daikman This looks ok to me, could you open a PR, or were there some other changes you wanted to make?
@Moohan I've opened a pull request for this now. Happy to merge whenever
Could
get_dataset
be amended to use the newcol_select
arguments?A lot of datasets have all resources being basically the same, so this would lend itself to column selection really nicely. e.g. gp-practice-contact-details-and-list-sizes
There would be some issues with datasets where each of the resources are different e.g. geography-codes-and-labels but I think it could still be useful if implemented well. A separate but related issue is that
get_dataset("gp-practice-contact-details-and-list-sizes")
currently doesn't work, because some datasets storeHB
as a string but others as a number.