Closed pbchase closed 6 years ago
The Logging, if we’re taking about the same one that’s in REDCap itself, should contain which user exported the data, the time the export was completed, and all the variable names that were exported.
@tlstoffs, would it be enough to say "all variables were exported" or do we need to enumerate the variable names? Of course we'd more likely say "All records and fields were exported".
Well, where it gets tricky is when someone exports de-identified data, the log needs to reflect that somehow – either saying the ‘data was de-identified’ or by listing all fields exported. However, you wouldn’t be able to prove the data export contained no PHI unless you had all the variables listed. Someone may have not flagged all PHI fields as PHI in the project.
Currently all variables names that were exported get listed on the logging page:
report_id: ALL, export_format: CSV, rawOrLabel: raw, fields: "record_id, phlebotomist_name, blood_time, sst1_filled, sst1_volume, sst1_quest_call_time, sst2_filled, sst2_volume, edta1_filled, edta1_volume_2, edta1_quest_call_time, edta2_filled, edta2_volume, processor_name, process_time, freezer_box, vial_1_serum, vial_2_serum, vial_3_serum, vial_4_serum, vial_5_serum, vial_6_serum, vial_7_serum, vial_8_serum, vial_1_edta, vial_2_edta, vial_3_edta, vial_4_edta, vial_5_edta, vial_6_edta, vial_7_edta, vial_8_edta, blood_notes, blood_sample_processing_complete, meter_instructions, walk_mins, walk_secs, walk_meters, walk_cane, meter_walk_complete, vitals_avg_systol, vitals_avg_diastol, vitals_weight, vitals_complete, waist_cm1, waist_cm2, waist_q3, waist_cm3, height, waist_q5, waist_circumference_height_complete, cowa_date, cowa_total, cowa_correction, cowa_complete, grip_date, grip_q1, grip_q1a, grip_q2, grip_q2a, grip_q3, grip_q4, grip_rh_excluded, grip_rh_trial1, grip_rh_trial1_refused, grip_rh_trial2, grip_rh_trial2_refused, grip_rh_trial3, grip_rh_trial3_refused, grip_lh_excluded, grip_lh_trial1, grip_lh_trial1_refused, grip_lh_trial2, grip_lh_trial2_refused, grip_lh_trial3, grip_lh_trial3_refused, grip_strength_complete, knee_q1, knee_q2, knee_q3, knee_q4, knee_q5, knee_q5a, knee_q6, knee_q6a, knee_q7, knee_q8a, knee_q8b, knee_q8c, knee_q8d, knee_q8e, knee_q8f, knee_range, knee_trial1_torque_awy, knee_trial1_torque_twd, knee_trial1_power_awy, knee_trial1_power_twd, knee_trial1_cv_awy, knee_trial1_cv_twd, knee_trial2_torque_awy, knee_trial2_torque_twd, knee_trial2_power_awy, knee_trial2_power_twd, knee_trial2_cv_awy, knee_trial2_cv_twd, knee_trial3_torque_awy, knee_trial3_torque_twd, knee_trial3_power_awy, knee_trial3_power_twd, knee_trial3_cv_awy, knee_trial3_cv_twd, knee_q9, knee_q10, knee_q11, knee_extensionflexion_complete, ma_date, ma_q1, ma_q2, ma_q3, ma_q4, ma_provided, ma_returned, ma_expected, ma_taken, ma_adherence, medication_adherence_complete, ae_type, ae_other, adverse_event_report_complete, sae_type, serious_adverse_event_report_complete, rand36_date, rand36_1, rand36_2, rand36_3, rand36_4, rand36_5, rand36_6, rand36_7, rand36_8, rand36_9, rand36_10, rand36_11, rand36_12, rand36_13, rand36_14, rand36_15, rand36_16, rand36_17, rand36_18, rand36_19, rand36_20, rand36_21, rand36_22, rand36_23, rand36_24, rand36_25, rand36_26, rand36_27, rand36_28, rand36_29, rand36_30, rand36_31, rand36_32, rand36_33, rand36_34, rand36_35, rand36_36, sf36_complete, sppb_balance, sppb_gait, sppb_chair, sppb_total, sppb_complete"
@tlstoffs, you make an excellent point about the identifiable data. Were you aware that the export large data module provides no accommodation for subsetting or de-identifying the data? It exports everything in one large CSV. Is this too much of a limitation for the module to be useful?
That aside, I think the correct approach is enumerate all the fields as you show in your post. The detail is valuable and it prepares us to provide a more precise audit trail if we do implement any data subsetting.
I discussed the column sub-setting and data-identification. We agreed that is not critical at this time. I will make a separate issue.
@pbchase @tlstoffs this PR fixes this issue.
Closed by PR #9
Reduce the number of messages written into the REDCap log. At the same time, add details to the logs that are there. e.g. an elapsed time would be good. see https://github.com/pbchase/export_large_projects/commit/b3c6035ce7fdd52a6379563cd48689b0eb5e06df
@tlstoffs, what did you want reflected in the REDCap log? You had an opinion on what needed to be in the log for audit purposes.