getodk / briefcase

ODK Briefcase is a Java application for fetching and pushing forms and their contents. It helps make billions of data points from ODK portable. Contribute and make the world a better place! ✨💼✨
https://docs.getodk.org/briefcase-intro
Other
60 stars 154 forks source link

[Export] Aggregate audit information into one output CSV file #637

Closed ggalmazor closed 5 years ago

ggalmazor commented 6 years ago

Problem description

As explained in #621, when exporting a form that attaches audit information, the audit CSV output files get saved in the media output folder, like any other binary attachments. Since all submissions use the fixed audit.csv filename:

Example Audit data

Event Node Start End
form start   Tue Sep 11 13:06:26 GMT+200 2018 N/A
question /data/some_field Tue Sep 11 13:06:26 GMT+200 2018 Tue Sep 11 13:06:28 GMT+200 2018
jump   Tue Sep 11 13:06:28 GMT+200 2018 Tue Sep 11 13:06:30 GMT+200 2018
end screen   Tue Sep 11 13:06:30 GMT+200 2018 Tue Sep 11 13:06:31 GMT+200 2018
form save   Tue Sep 11 13:06:31 GMT+200 2018 N/A
form exit   Tue Sep 11 13:06:31 GMT+200 2018 N/A
form finalize   Tue Sep 11 13:06:31 GMT+200 2018 N/A

Steps to reproduce the problem

Expected behavior

Instead of having an audit CSV output file per exported submission saved in the media output directory, we want all audit data of the same form to be merged into a single audit CSV output file.

To do so, Briefcase should append a new column to the audit information with the submission UID it belongs to. The resulting Audit Data would look like this:

Instance ID Event Node Start End
uuid:53a83013-99ff-434d-a096-68fe1464c249 form start   Tue Sep 11 13:06:26 GMT+200 2018 N/A
uuid:53a83013-99ff-434d-a096-68fe1464c249 question /data/some_field Tue Sep 11 13:06:26 GMT+200 2018 Tue Sep 11 13:06:28 GMT+200 2018
uuid:53a83013-99ff-434d-a096-68fe1464c249 jump   Tue Sep 11 13:06:28 GMT+200 2018 Tue Sep 11 13:06:30 GMT+200 2018
uuid:53a83013-99ff-434d-a096-68fe1464c249 end screen   Tue Sep 11 13:06:30 GMT+200 2018 Tue Sep 11 13:06:31 GMT+200 2018
uuid:53a83013-99ff-434d-a096-68fe1464c249 form save   Tue Sep 11 13:06:31 GMT+200 2018 N/A
uuid:53a83013-99ff-434d-a096-68fe1464c249 form exit   Tue Sep 11 13:06:31 GMT+200 2018 N/A
uuid:53a83013-99ff-434d-a096-68fe1464c249 form finalize   Tue Sep 11 13:06:31 GMT+200 2018 N/A

This way, all audit data of submissions from the same form could be cleanly merged to produce a single output file.

The proposed filename would follow this template: {FORM NAME} - audit.csv.

Other information

Audit information is mapped with a binary binding, which is exported by Briefcase by writing their corresponding file names and, as a side-effect, by copying the media files from the storage directory to the media output directory.

~A strategy to implement the proposed behavior would be to change Model.getDataType() to consider not only the field's bound data type, but also the field's name and ancestry (meta.audit) to return a new DataType.AUDIT that could be mapped into a new CsvFieldMapper that would write the same values to the output CSV files and append the audit file's contents to the new proposed output file.~ This strategy is not a good idea because it would involve adding a new DataType in JR, although the new mapper concept is still valid.

This mapper could take care of:

ggalmazor commented 6 years ago

@lognaturel, I've ruled out the original strategy I had in mind because it would involve changes in JR (adding a new DataType value for audit fields). I'm right on thinking this is not worth changing JR, right? Could this new AUDIT data type be used for anything else?