chanzuckerberg / czid-workflows

Portable WDL workflows for CZ ID production pipelines
https://czid.org/
MIT License
37 stars 7 forks source link

Sort bowtie2 outputs for reproducibility #365

Closed rzlim08 closed 5 months ago

rzlim08 commented 5 months ago

The outputs were tested as following:

  1. Run 2 pipelines with the same input
  2. Run python3 scripts/compare-outputs.py <output.json> <output2.json> to compare md5 hashes
  3. Manually diff files to determine the cause of the discrepancies

The remaining files which are different on every run are: czid_short_read_mngs.host_filter.fastp_html czid_short_read_mngs.postprocess.assembly_out_assembly_spades_output_log

Both of which contain timestamps which account for the discrepancy

katrinakalantar commented 5 months ago

This looks good to me too! Thanks for making these changes!