creation-date : <datetime> Date the table was built (ISO 8601 format)
In both cases, this is clearly a required field, so I think the best solution is to allow the date to be passed as an optional argument (defaulting to the current default of now). The user could then explicitly use (for example) the last modified date of their input data and metadata. It would also facilitate using diff for continuous integration testing.
In comparison, although the BAM format for sequencing data uses the GZIP header, most implementations deliberately do not fill in the MTIME field, ensuring full reproducibility.
Thanks, @peterjc! I completely agree with the this proposition. For additional context, the exact lines impacted are here and here.
These should be pretty minor changes to make. I'll add them on the next release, and I think cutting a minor one relatively quickly to support this is valuable.
Quoting
table.py
, both methodsto_json
andto_hdf5
use the following:Using a live date means otherwise reproducible analysis will fail a simple diff due to the time stamp.
Quoting https://biom-format.org/documentation/format_versions/biom-1.0.html
date : <datetime> Date the table was built (ISO 8601 format)
Quoting https://biom-format.org/documentation/format_versions/biom-2.0.html and https://biom-format.org/documentation/format_versions/biom-2.1.html
creation-date : <datetime> Date the table was built (ISO 8601 format)
In both cases, this is clearly a required field, so I think the best solution is to allow the date to be passed as an optional argument (defaulting to the current default of now). The user could then explicitly use (for example) the last modified date of their input data and metadata. It would also facilitate using diff for continuous integration testing.
In comparison, although the BAM format for sequencing data uses the GZIP header, most implementations deliberately do not fill in the MTIME field, ensuring full reproducibility.