Question About Data Transforms

JBarsotti commented 2 months ago

First off, this is a great library that using makes data extraction way easier! I have a question about rd_transform(final_format = "by_event"). For one REDCap database I have, it works fine, but for a different database, I get the error:

Error: There're more variables in the dictionary than in the data base so it's not possible to split by event. Transformation stops.

I'm not totally sure what this means and would like to ask about it.

Thanks,

John

JBarsotti commented 2 months ago

Just as a follow-up to my previous question, I found that a few of the fields in the REDCap project were not exported to the "data" dataframe, but were exported to the "dictionary" dataframe when I called the redcap_data function. I'm not totally sure why. There were four fields, all radio buttons with only a single choice option.

jcarmezim commented 2 months ago

Good morning, John.

The error you are experiencing is due to a safeguard within the rd_transform function, which is triggered when there are more variables in the dictionary than in the data. In these instances, it is not possible to guarantee that the data will be properly split by event therefore the safeguard is triggered.

Can you, please, tell us if you are using the API connection or the exported files from REDCap to import your data into R? And can you also confirm if the names of those 4 variables, by any chance, end with "_complete"?

We will investigate it and attempt to resolve it as soon as possible.

Thank you for your message, João

JBarsotti commented 2 months ago

Thanks for the reply! I am using the API, and yes, they do all end with complete!

jcarmezim commented 2 months ago

We have identified the problem. There was an error in the code where the redcap_data function eliminated variables ending in "_complete". Initially we did this to eliminate the variables that REDCap creates by default related to the completion of each instrument, but then we adapted this process to be an argument of the rd_transform function (delete_pattern). We are so sorry for the inconvenience and we are updating the package version on GitHub and soon on CRAN. You can install the new version with: remotes::install_github('bruigtp/REDCapDM')

Thank you for reporting this issue and helping us improve the package.

Please try the new version and if you are able to execute the rd_transform() function without any problems, we would appreciate it if you would close this issue.

JBarsotti commented 2 months ago

Thank you so much for the aid! Unfortunately, that fix does not seem to help. I still get the same error.

JBarsotti commented 2 months ago

In order to get it to work, I had to delete the variables in the data and data dictionary that contained "_complete" anywhere in the name of the variable. Then the transform would run.

JBarsotti commented 2 months ago

An alternative that also works is to edit the function directly. I edited the code of the function rd_transform so that it reads like this:

if (!is.null(delete_pattern)) {
    for (i in 1:length(delete_pattern)) {
      if (delete_pattern[i] == "_complete") {
        data <- data %>% dplyr::select(!tidyselect::contains(c("_complete", 
                                                                "_complete.factor")))
        dic <- dic %>% dplyr::filter(!grepl(delete_pattern[i], 
                                            .data$field_name))
      }
      else if (delete_pattern[i] == "_timestamp") {
        data <- data %>% dplyr::select(!tidyselect::contains(c("_timestamp", 
                                                                "timestamp.factor")))
        dic <- dic %>% dplyr::filter(!grepl(delete_pattern[i], 
                                            .data$field_name))
      }
      else {
        data <- data %>% dplyr::select(!tidyselect::contains(delete_pattern[i]))
        dic <- dic %>% dplyr::filter(!grepl(delete_pattern[i], 
                                            .data$field_name))
      }
    }

jcarmezim commented 2 months ago

Good morning John,

You are absolutely right, we only removed the variables with the _complete pattern from the data, not from both the dictionary and the data. We have applied your alternative to the version of the package on GitHub with a slight modification:

if(!is.null(delete_pattern)){

    for(i in 1:length(delete_pattern)){

      if(delete_pattern[i] == "_complete"){

        data <- data %>%
          dplyr::select(!tidyselect::ends_with(c("_complete", "_complete.factor")))

        dic <- dic %>%
          dplyr::filter(!grepl("_complete$", .data$field_name))

      }else if(delete_pattern[i] == "_timestamp"){

        data <- data %>%
          dplyr::select(!tidyselect::ends_with(c("_timestamp", "timestamp.factor")))

        dic <- dic %>%
          dplyr::filter(!grepl("_timestamp$", .data$field_name))

      }else{

        data <- data %>%
          dplyr::select(!tidyselect::contains(delete_pattern[i]))

        dic <- dic %>%
          dplyr::filter(!grepl(delete_pattern[i], .data$field_name))
      }

    }

  }

Now it will eliminate all variables ending in _complete or _timestamp from both the dictionary as well as the data. Thank you so much for your contribution!!!

bruigtp / REDCapDM

Question About Data Transforms #8