ctsit / export_large_projects

A REDCap module for exporting very large projects
1 stars 4 forks source link

Adjust log output written into the REDCap log #7

Closed pbchase closed 6 years ago

pbchase commented 6 years ago

Reduce the number of messages written into the REDCap log. At the same time, add details to the logs that are there. e.g. an elapsed time would be good. see https://github.com/pbchase/export_large_projects/commit/b3c6035ce7fdd52a6379563cd48689b0eb5e06df

@tlstoffs, what did you want reflected in the REDCap log? You had an opinion on what needed to be in the log for audit purposes.

tlstoffs commented 6 years ago

The Logging, if we’re taking about the same one that’s in REDCap itself, should contain which user exported the data, the time the export was completed, and all the variable names that were exported.

pbchase commented 6 years ago

@tlstoffs, would it be enough to say "all variables were exported" or do we need to enumerate the variable names? Of course we'd more likely say "All records and fields were exported".

tlstoffs commented 6 years ago

Well, where it gets tricky is when someone exports de-identified data, the log needs to reflect that somehow – either saying the ‘data was de-identified’ or by listing all fields exported. However, you wouldn’t be able to prove the data export contained no PHI unless you had all the variables listed. Someone may have not flagged all PHI fields as PHI in the project.

Currently all variables names that were exported get listed on the logging page:

report_id: ALL, export_format: CSV, rawOrLabel: raw, fields: "record_id, phlebotomist_name, blood_time, sst1_filled, sst1_volume, sst1_quest_call_time, sst2_filled, sst2_volume, edta1_filled, edta1_volume_2, edta1_quest_call_time, edta2_filled, edta2_volume, processor_name, process_time, freezer_box, vial_1_serum, vial_2_serum, vial_3_serum, vial_4_serum, vial_5_serum, vial_6_serum, vial_7_serum, vial_8_serum, vial_1_edta, vial_2_edta, vial_3_edta, vial_4_edta, vial_5_edta, vial_6_edta, vial_7_edta, vial_8_edta, blood_notes, blood_sample_processing_complete, meter_instructions, walk_mins, walk_secs, walk_meters, walk_cane, meter_walk_complete, vitals_avg_systol, vitals_avg_diastol, vitals_weight, vitals_complete, waist_cm1, waist_cm2, waist_q3, waist_cm3, height, waist_q5, waist_circumference_height_complete, cowa_date, cowa_total, cowa_correction, cowa_complete, grip_date, grip_q1, grip_q1a, grip_q2, grip_q2a, grip_q3, grip_q4, grip_rh_excluded, grip_rh_trial1, grip_rh_trial1_refused, grip_rh_trial2, grip_rh_trial2_refused, grip_rh_trial3, grip_rh_trial3_refused, grip_lh_excluded, grip_lh_trial1, grip_lh_trial1_refused, grip_lh_trial2, grip_lh_trial2_refused, grip_lh_trial3, grip_lh_trial3_refused, grip_strength_complete, knee_q1, knee_q2, knee_q3, knee_q4, knee_q5, knee_q5a, knee_q6, knee_q6a, knee_q7, knee_q8a, knee_q8b, knee_q8c, knee_q8d, knee_q8e, knee_q8f, knee_range, knee_trial1_torque_awy, knee_trial1_torque_twd, knee_trial1_power_awy, knee_trial1_power_twd, knee_trial1_cv_awy, knee_trial1_cv_twd, knee_trial2_torque_awy, knee_trial2_torque_twd, knee_trial2_power_awy, knee_trial2_power_twd, knee_trial2_cv_awy, knee_trial2_cv_twd, knee_trial3_torque_awy, knee_trial3_torque_twd, knee_trial3_power_awy, knee_trial3_power_twd, knee_trial3_cv_awy, knee_trial3_cv_twd, knee_q9, knee_q10, knee_q11, knee_extensionflexion_complete, ma_date, ma_q1, ma_q2, ma_q3, ma_q4, ma_provided, ma_returned, ma_expected, ma_taken, ma_adherence, medication_adherence_complete, ae_type, ae_other, adverse_event_report_complete, sae_type, serious_adverse_event_report_complete, rand36_date, rand36_1, rand36_2, rand36_3, rand36_4, rand36_5, rand36_6, rand36_7, rand36_8, rand36_9, rand36_10, rand36_11, rand36_12, rand36_13, rand36_14, rand36_15, rand36_16, rand36_17, rand36_18, rand36_19, rand36_20, rand36_21, rand36_22, rand36_23, rand36_24, rand36_25, rand36_26, rand36_27, rand36_28, rand36_29, rand36_30, rand36_31, rand36_32, rand36_33, rand36_34, rand36_35, rand36_36, sf36_complete, sppb_balance, sppb_gait, sppb_chair, sppb_total, sppb_complete"

pbchase commented 6 years ago

@tlstoffs, you make an excellent point about the identifiable data. Were you aware that the export large data module provides no accommodation for subsetting or de-identifying the data? It exports everything in one large CSV. Is this too much of a limitation for the module to be useful?

That aside, I think the correct approach is enumerate all the fields as you show in your post. The detail is valuable and it prepares us to provide a more precise audit trail if we do implement any data subsetting.

pbchase commented 6 years ago

I discussed the column sub-setting and data-identification. We agreed that is not critical at this time. I will make a separate issue.

tbembersimeao commented 6 years ago

@pbchase @tlstoffs this PR fixes this issue.

pbchase commented 6 years ago

Closed by PR #9