This PR attempts to make almost all outputs of the short-read-mngs pipeline reproducible when run on the same input.
The changes made here are issues in sorting:
sort bowtie2_ercc.sam file
sort bowtie2 output from reads to contig mapping
sort list of NT/NR source in output json
The outputs were tested as following:
Run 2 pipelines with the same input
Run python3 scripts/compare-outputs.py <output.json> <output2.json> to compare md5 hashes
Manually diff files to determine the cause of the discrepancies
The remaining files which are different on every run are:
czid_short_read_mngs.host_filter.fastp_html
czid_short_read_mngs.postprocess.assembly_out_assembly_spades_output_log
Both of which contain timestamps which account for the discrepancy
The outputs were tested as following:
python3 scripts/compare-outputs.py <output.json> <output2.json>
to compare md5 hashesdiff
files to determine the cause of the discrepanciesThe remaining files which are different on every run are: czid_short_read_mngs.host_filter.fastp_html czid_short_read_mngs.postprocess.assembly_out_assembly_spades_output_log
Both of which contain timestamps which account for the discrepancy