ctsit / redcapcustodian

Simplified, automated data management on REDCap systems
Other
12 stars 6 forks source link

Ignore columns in `sync_table_2` #88

Open pbchase opened 1 year ago

pbchase commented 1 year ago

Ignore specified columns in update_records dataframe of sync_table_2 at https://github.com/ctsit/redcapcustodian/blob/c4293d224669be4f05330aa59dc79ec857c53416/R/write_data.R#L226

The problem stems from adding columns to a data frame to time-stamp record creation and update. The added columns are by definition not in the data source and they are completely novel on each run of the script. If on run N we get data A and on run N +1 we get data A + data B. We will rewrite all of A with a new timestamp.

I think the fix is to allow the caller to specify a vector of column names that should be ignored in the anti_join, which removes known data. This new parameter might be named columns_to_ignore, ignore_in_update, novel_columns, or some better name than these examples I am throwing up against the wall.