azwaans / timtam-diamond-princess

MIT License
1 stars 0 forks source link

Changing the sequence to a confirmed case has invalidated FASTA #1

Closed aezarebski closed 1 year ago

aezarebski commented 1 year ago

Description

I tried to replicate the analysis, but the diamond.fasta file is no longer consistent with the XML: some of the sequences have new identifiers. The identifiers have changed because they contain the time stamp of the sequence and removing one of the sequences has changed the distribution of the sequences across the day. It's good to see that the changes have propagated through the files but means that you need to tweak the XML for it to run.

Files involved

To reproduce

  1. Copy diamond.fasta to data/ and use sha256sum diamond.fasta to see it is good
  2. Run ant setuplib
  3. Run ant mcmc for error message
  4. Compare output of cat data/diamond.fasta | grep ">" to cat data/readme.org | grep "^DP" to see these are different

Tasks

Here are some loose suggestions, there are probably cleaner solutions.

  1. Remove time stamps from diamond.fasta and store this mapping somewhere else so labels do not include analysis results.
  2. Update readme.org.
  3. Update XML to use the new identifiers (this should just be find replace on the labels.)
  4. Double check the analysis works again now.
aezarebski commented 1 year ago

I'm closing this because I think it is out of scope if we have verified that the pipeline all works and we have the correct hash for the diamond.fasta now.