OuhscBbmc / REDCapR

R utilities for interacting with REDCap
https://ouhscbbmc.github.io/REDCapR
Other
112 stars 46 forks source link

Export of checkbox items and `_complete` fields #420

Open januz opened 2 years ago

januz commented 2 years ago

Hi @wibeasley, I assume this is not an {REDCapR}-specific problem but a general problem with the API but I hope that you know ways how to properly deal with these issues:

  1. When exporting a checkbox field that does not have a selection, all associated ___[1-n] fields are set to 0; this is also true if the complete form the checkbox items is part of, is empty:
    • This is quite misleading because it looks as though the item is not incomplete.
    • This also causes me headaches when exporting data for our study for sharing (thousands of fields, quite a few of them are checkbox fields) because I'd rather not export data for cases that did not provide data for this item or the complete form/table that the items is a part of.
  2. When exporting a form that does not have a completion status for a given record, the _complete field is set to 0:
    • This has recently caused issues for me because I imported data that I had assembled using {REDCapR} into another system using the file import functionality and set forms that should have remained with no completion status to "incomplete" (0)

I feel like that in both cases the "proper" export would be to have NAs instead of 0s. That would make it easy to differentiate cases

Thanks for your input!

wibeasley commented 2 years ago

I agree these two aspects aren't ideal. I think there is an accepted hack for each one though:

  1. I think what some people do is create a hidden datetime field (in each form) that is automatically populated with @NOW. So if the form hasn't been touched, this datetime field will be null/blank.
  2. Does #386 help? It's been a few days since I did that. I think a gray circle is equivalent to a null. Double-check me though.
januz commented 2 years ago

Thanks so much, @wibeasley, the exportBlankForGrayFormStatus API option should solve issue 2 (only problem for me personally is that it is only available in newer REDCap versions).

As for 1, the method you describe works well for forms that are completely empty (using the exportBlankForGrayFormStatus option and checking for NAs in the _complete field would also work).

But there are cases where just the checkbox field is missing but the rest of the form has data. In those, one would not be able to easily differentiate between cases that did or did not fill in information without having to consider all ___[1-n] columns associated with a given checkbox field and summing them up (if 0, the field is "missing"). It would be ideal / more appropriate to just have NAs for all the ___[1-n] columns. I wonder why the API was designed in this way...

Thanks for your help!!

wibeasley commented 2 years ago

I wonder why the API was designed in this way...

Do you have access to the database, specifically the redcap_data table? If not, I think you can determine the same thing with the API playground. See if that record exists for the checkboxes. If it doesn't, I think we have a window.

If the record doesn't exist in the table, we could extract the data as an eav. Essentially REDCap's PHP code doesn't widen/pivot it --REDCapR's R code does. I've been working off and on with this semi-hidden function: https://ouhscbbmc.github.io/REDCapR/reference/redcap_read_oneshot_eav.html

Try this: go to REDCap's API playground and select a single record (where you'd like the checkboxes to be null, not zero). Also make sure "eav" is selected instead of "flat".

image

januz commented 2 years ago

@wibeasley Thanks so much, I didn't know about the eav option for API exports. When I try it out in the API playground, the output looks like one would expect, i.e., no rows if no options were selected for a given checkbox.

I cloned the development version of this repository but my first attempts with redcap_read_oneshot_eav() have been unsuccessful. It seems to pull the data correctly (the printed raw text output is as expected) but the postprocessing into a tibble fails (error: "The REDCap read failed")

wibeasley commented 2 years ago

Oh good. If the rows aren't there, that makes it easier for the code to decide if they should be false or NA. Will you please create & attache a test project (dictionary and dataset) that demonstrates the problem & desired behavior? Maybe ~5 rows with ~4 checkboxes and a comment field. The comment field includes a statement that reinforces your desired outcome (eg, "ideally these checkboxes are returned as NAs" and "the checkboxes are returned as TRUE/FALEs".

Do you want me to work on the EAV side, or do you want another crack? This is good timing because the EAV pivoting will really benefit from the latest REDCapR function, `redcap_metadata_coltypes() (#405).

januz commented 2 years ago

Hi @wibeasley Thanks so much for working with me! I put together the project that you described and am attaching here

https://drive.google.com/file/d/17RyeCofnVPrWr5RgQkklkLdVhvDUmD_H/view?usp=sharing

In addition to the comments I left in the descriptive field, here the summary

It would be fantastic if you could steer the development. I played around with your current version a little bit yesterday but did not make any progress on improving it. In addition, I came down with COVID today so I won't be at my best physically and cognitively for a bit :/ The new function that you mentioned sounds great. From my testing yesterday, setting the correct data types seems one of the more difficult and very time consuming parts of the function, especially with a data dictionary as huge as ours (>38,000 variables). I guess that was one of your motivations for developing it? Thanks so much!

wibeasley commented 2 years ago

Sorry about covid. Don't worry about responding soon. I'm writing some of this to remind myself in the future.

Your description above is perfect and helped me see the weakness with the REDCap extract process (independent of the API or R layers) I was expecting the exported csv to have zeros (instead of nulls/blanks). But the xml does too (thanks for sending it, btw.

When I saw the xml, I thought I'd have to delete that record and recreate it (without touching the 2nd form). And I'm surprised that the playground (loaded from the xml) doesn't have those rows.

image

image

januz commented 2 years ago

@wibeasley it's indeed surprising that the XML also contains the 0s! Let me know if you need further input of if I can help with testing or anything.

januz commented 1 year ago

@wibeasley I am happy to see that you have a few commits associated with this issue. Did you make progress on the implementation? Is there something / a branch that I could try out and/or something I could help with? I will soon have to export data from our humongous REDCap project and the EAV functionality (and associated correct export, or non-export, of checkbox item data) would be amazing to have. Thank you!

wibeasley commented 1 year ago

I forgot exactly where I left it last week. Whatever it was should be on the dev branch. I think I had finished augmenting the metadata function so that it returned info the eav function needed to pivot.you might want to poke around if your deadline is soon. I'm at REDCapCon for another two days. I hope to get back to this early next week.

januz commented 1 year ago

Ah cool, have a great time at REDCapCon! I'll try to check it out your current implementation soon.

wibeasley commented 1 year ago

@januz, I'm working on this now. I did some things in other functions (#427) that should make this easier eventually, but it would be hard for someone else to pick it up right now. Gimme a few days.

januz commented 1 year ago

@wibeasley Great, thank you so much!! I'm looking forward to testing it.

januz commented 1 year ago

@wibeasley Sorry to bother you again with this but it looks as though you made some progress with this issue? Fortunately, our data release has been pushed out so I will have some extra time before I need this functionality but I'm very interested in using it as I feel like the current export of checkbox items is misleading... Thanks for all your work on this!

wibeasley commented 1 year ago

note: when @januz & I met yesterday, we decided that when coalescing checkboxes, the form's *_complete value should be considered. If it's missing, then the checkbox should not be coalesced w/ false.

https://github.com/OuhscBbmc/REDCapR/blob/e267521f513790744b81bc01a0010ae4a22236b1/playgrounds/eav-playground-2.R#L117-L118

I think the tricky part would be to account for the project's & the form's value set of record_id, redcap_event_name, redcap_repeat_instrument, and redcap_repeat_instance. We're hoping to do this entirely with metadata-like calls, and not rely on the data itself.

januz commented 1 year ago

Hey @wibeasley, I finally got around to working on this. I think that I have figured out a way that takes into account the different project settings (longitudinal, repeating, combination of both, or neither). I wasn't able to do so completely independently of the data. For example, the repeat status of a project does not determine whether the redcap_repeat_... columns are included in the eav export but the requested data does. But I think that my solution could be used to develop a general export function that allows to export checkbox data in a more correct format.

On that note, I think that I previously didn't communicate the goal clearly enough (or this might have gotten mixed up because when I initiated this issue I was in addition talking about the _complete fields):

The standard API exports sets all ___ fields to 0/FALSE, independent of whether a box has been checked for that item or whether it was even seen by a participant. What I would like to achieve is that

I feel like this is a more meaningful representation of the data that doesn't suggest one can assume that each of the ___ fields should be considered 0/FALSE for that participant but shows it as missing data.

I tried to push my changes (I create a new playground file and, based on one of your function, added a function that gets the repeating forms and event associations from the API) but I don't have the privilege to. I will send you an email with those two files and tokens for 4 different REDCap projects I set up for testing

I set up all projects in a way that they use the same data dictionary with a setup form with the record_id and three other forms one of which contains checkbox items. Each has data for 4 records

At the bottom of the playground script I use compareDF to show the difference between the standard export and the processed eav export. For example, this shows the difference for the project that is neither longitudinal nor repeating:

image

This shows the differences between the different export styles:

I hope this is useful. It would be great if you could test on your end and let me know what you think. Thanks so much!

januz commented 1 year ago

@wibeasley Just an update as I've been using the above laid out approach and a variant of the code that I sent you to export data from our massive REDCap project (for example, I added some batching code that divides up the returned data into batches of around 1000 rows each, based on the requested participants events fields).

Overall, it works great but I've run into a few issues

lz100 commented 8 months ago

Hi @wibeasley, @januz and I put together a working function for checkboxes. Since we are not familiar with your code base and internal functions, it will be better for you to take a look, and I hope it will be helpful. The idea is to take this function as your internal function and give an option to users to let them choose what to do with the checkbox in the redcap_read functions.