bruigtp / REDCapDM

R package to perform data management of REDCap datasets in R
https://bruigtp.github.io/REDCapDM/
Other
4 stars 0 forks source link

Friction log on trying to use the package #6

Closed llrs closed 5 months ago

llrs commented 5 months ago

Hi João, I am trying to use the package to read the data and I found several friction points I describe below.

It seems that checks are a bit inconsistent or at least the documentation could be improved.

> rdm <- REDCapDM::redcap_data(uri = "https://redcap.igtp.cat/api/")
Error in REDCapDM::redcap_data(uri = "https://redcap.igtp.cat/api/") : 
  object 'data_def' not found

In the first case the data definition is requested before connecting to the database (which it couldn't get in as it lacked the token). It would be nice if first the required arguments are checked whether they are present.

> rdm <- REDCapDM::redcap_data(uri = "https://redcap.igtp.cat/api/",
+                              token = "mytoken")
Importing in progress...
Error in curl::curl_fetch_memory(url, handle = handle) : 
  SSL peer certificate or SSH remote key was not OK: [redcap.igtp.cat] SSL certificate problem: unable to get local issuer certificate

When I add the token there is an error that was not "graciously handled" (in CRAN terms). It would be nice if the user got a nice message about what is wrong and how to fix it. I think in this case is a server side problem but you might have more experience with this. (Any help appreciated)

Additionally there are no examples on the function that might help to know how to use the function. I found a public database in the vignette, but it fails:

> dataset_api <- redcap_data(uri = "https://redcap.idibell.cat/api/",
+                            token = "55E5C3D1E83213ADA2182A4BFDEA")
Importing in progress...
Error in sanitize_token(token) : 
  The token is not a valid 32-character hexademical value.

When I try it with one public database provided by REDCapR it doesn't work:

> uri   <- "https://bbmc.ouhsc.edu/redcap/api/"
> token <- "9A81268476645C4E5F03428B8AC3AA7B" 
> rdm <- REDCapDM::redcap_data(uri = uri, token = token)
Importing in progress...
Error in `dplyr::rename()`:                                                                                                                   
! Can't rename columns that don't exist.
✖ Column `Record ID` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: In result$status : partial match of 'status' to 'status_code'
2: In result$status : partial match of 'status' to 'status_code'
3: In result$status : partial match of 'status' to 'status_code'

See also that there are some partial matching which might induce to errors (not sure if this is in your code or in some dependency code).

Many thanks for developing this package, I hope this helps to make it even better.

packageVersion("REDCapDM")
[1] '0.9.7'
llrs commented 5 months ago

I finally managed to read my data, after reducing the security in the connection. But I face another problem:

rdmt <- rd_transform(rdm)
Transformation in progress...
Error in `mutate()`:
ℹ In argument: `c3_start_date_bsl = (function (x, ...) ...`.
Caused by error in `charToDate()`:

The problem is that one field named c3_start_date_bsl has a value of "UNK". Probably the date is unknown and this is the shortening used to store it in the database.

When I tried to omit it by using the delete_vars argument I couldn't get pass the same line: Columns are deleted after transformation.

I would recommend to first delete and then transform the data, as this would reduce processing time to some columns that are later on not wanted.

Looking at the source code I also observed that delete_vars variable is passed to grepl() as a pattern, it could happen that an unwanted column is part of other names: "disease1", "disease1_date", "disease1_type", ... I might want to omit the disease1 column but keep the disease1date or other columns with the disease1* pattern.

jcarmezim commented 5 months ago

Good morning Lluís,

Firstly, thank you so much for using the package and for your feedback.

To address the initial error of using the URI without specifying the token, we will implement a check to prevent this from occurring.

The second error regarding the "SSL peer certificate" does indeed appear to be a server-side issue and it comes from the code of one of the dependencies. However, we will try to solve this.

While creating the package's vignette, we made the decision not to include the exact token of the incorporated database. But it is true that we do not mention this in the vignette, so we will add a note to the vignette informing users of the unavailability of the token.

The error you encountered when using the public database has already been fixed, and we will include it in the next version of the package along with the other necessary fixes. The partial matching warnings are also originating from dependencies' code, but we will look into them to see if there is a possible fix.

Regarding the 'delete_vars' argument, we totally agree with your suggestion. We will modify the order of tasks so that the function first eliminates the desired variables and then performs the transformation. To address the issue of using a pattern to eliminate variables, we believe the best solution is to introduce two separate arguments: one that eliminates all variables with a specific pattern and another that eliminates only the variables specified by the user.

We hope our response has addressed each of your points clearly, and we will reach out to you as soon as all the issues are resolved.

Thank you once again for using this package and for highlighting areas for improvement. We are committed to improving the package significantly with your input.

llrs commented 5 months ago

Glad it helps. For the server error "just" wrapping the connection step in a tryCatch call and provide an informative error/message might be enough.

jcarmezim commented 5 months ago

Good morning @llrs,

We have updated the package on both CRAN and GitHub. It should be available for download on CRAN by tomorrow. Here are the changes in this new version based on your feedback:

If you encounter any more bugs, please do not hesitate to contact us.

Thank you once again.

llrs commented 5 months ago

Thanks for the quick responses and the improvements.

Recommending users to not check the ssl certificate is not a good idea in my opinion. I would recommend instead of: httr::set_config(httr::config(ssl_verifypeer = FALSE)) to use httr::with_config(httr::config(ssl_verifypeer = FALSE), rd <- readcap_data(...)). This will disable this check only for the expression inside with_config.

It is good software practice when renaming an argument to provide a way to maintain compatibility with old scripts: internally handle delete_vars as delete_pattern (perhaps with a warning) or similar approaches. This helps your users to keep scripts working without dedicating many efforts to maintenance.

While checking the function, I realized that readcap_data is very complex with a cyclomatic complexity of ~31. See cyclocomp::cyclocomp_package to check the package. This, together with other issues of the package, like the lack of tests, lead me to use other packages.