Arcadia-Science / 2023-amblyomma-americanum-txome-assembly

MIT License
0 stars 0 forks source link

Screen for contamination in final transcriptome #9

Closed taylorreiter closed 11 months ago

taylorreiter commented 12 months ago

~#5 should be merged first before this is reviewed.~ done.

Adds rules to screen for contamination:

  1. runs sourmash gather to see what all is in the txome that shouldn't be there. uses a very high kmer size (51) so we are very confident in matches.
  2. downloads the matching genomes
  3. blasts transcripts against those genomes
  4. filters contams out. Also does a specific filtering for endosymbiont that I will have to figure out how to generalize later.
taylorreiter commented 11 months ago

awesome, i'll keep this in mind for long-run dev! thank you!