Closed reiniervlinschoten closed 9 months ago
@martijnkersloot, the getStudyData function throws warnings when adjusting types to R. Specifically this piece of code: https://github.com/castoredc/castoRedc/blob/c0bf07b929e220683491f2584db5de5df0a6a2f7/R/CastorData.R#L718-L723
When fields with user missings (e.g. "##USER_MISSING_98##") are cast to numeric, they become NA. Do you want to: 1) Keep this transformation of USER_MISSING to NA, and keep the warning 2) Keep the transformation of USER_MISSING to NA, and drop the warning 3) Create a new category of missings in R to explicitly handle user missings? I use -99 to -95 in the Python package
@martijnkersloot , the decision was made to be verbose in some functions, and it is unclear to me why. For example here https://github.com/castoredc/castoRedc/blob/c0bf07b929e220683491f2584db5de5df0a6a2f7/R/CastorData.R#L755
and here https://github.com/castoredc/castoRedc/blob/c0bf07b929e220683491f2584db5de5df0a6a2f7/R/utils.R#L6-L9
For now I have commented this out, because I do not think this is relevant information for the user. Is that okay?
@martijnkersloot, repeating data instances and survey instances are now saved as an attribute of the studyData that is returned by getStudyData
study_data <- castor_api$getStudyData(creds$example_study, repeating_data_instances = T, survey_instances = T)
Moreover, all repeating data instances are combined in the same dataframe, meaning that each repeating data instance has a column for each field that exists in any repeating data instance. So if you have 5 different repeating data forms, each row has its own fields, but also columns for the fields of the four others. This is quite unclear and leads people to have to clean up this dataframe themselves. Is it okay if I refactor this to a list of lists, as I have done in the R wrapper for my Python package?
Resulting in
list("StudyData" = study_data_df,
"repeatingInstanceData" = list("Name1" = repeating_instance_one_df, "Name2" = repeating_instance_two_df,
"surveyInstanceData" = list("Name1" = survey_instance_one_df, "Name2" = survey_instance_two_df)
Hi @reiniervlinschoten,
When fields with user missings (e.g. "##USER_MISSING_98##") are cast to numeric, they become NA. Do you want to:
- Keep this transformation of USER_MISSING to NA, and keep the warning
- Keep the transformation of USER_MISSING to NA, and drop the warning
- Create a new category of missings in R to explicitly handle user missings? I use -99 to -95 in the Python package
How would option no. 3 look like? I would say that that is the most elegant solution.
For now I have commented this out, because I do not think this is relevant information for the user. Is that okay?
Yes, of course! I think these were only used for debugging purposes.
Is it okay if I refactor this to a list of lists, as I have done in the R wrapper for my Python package?
Absolutely, that is how it was supposed to work from the beginning! Please make sure that this is listed as a breaking change though.
Hi @reiniervlinschoten,
When fields with user missings (e.g. "##USER_MISSING_98##") are cast to numeric, they become NA. Do you want to:
- Keep this transformation of USER_MISSING to NA, and keep the warning
- Keep the transformation of USER_MISSING to NA, and drop the warning
- Create a new category of missings in R to explicitly handle user missings? I use -99 to -95 in the Python package
How would option no. 3 look like? I would say that that is the most elegant solution.
For numeric variables, they would just take have values -99 to -95, and I would add to the manual that those are the user missing values. For character values, I would just keep them as a level of the factor.
Is fixed in v2.0.0
Prints
https://github.com/castoredc/castoRedc/blob/c0bf07b929e220683491f2584db5de5df0a6a2f7/R/CastorData.R#L303