ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
259 stars 34 forks source link

all RdRP+ macro/micro contigs #252

Open rchikhi opened 3 years ago

rchikhi commented 3 years ago

This issues describes the procedure to search all of our contigs against RdRP and presents results. (Slack thread: https://hackseq-rna.slack.com/archives/C012H9SDQCA/p1615948152031200)

Input: FASTA files of contigs, either assembled using micro (all SRA .pro DIAMOND hits assembled with rnaviralspades) or macro (all from s3://lovelywater/assembly/contigs/, i.e. all CoV + dicistro + quenya + satellite + 1k random subset assembled using either coronaSPAdes or rnaviralspades).

Output: FASTA of all the contigs that hit RdRP either with HMM and/or palmscan, i.e. the RdRP+ contigs: s3://serratus-rayan/pro-assembly/rdrpplus.micro.fa s3://serratus-rayan/pro-assembly/rdrpplus.macro.fa total size: 8.2 GB

hmmsearch was run using an exhaustive collection of RdRP HMMs: https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/RdRP_all.v2.hmm

alignments were made using this script: https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/align_hmm_to_contigs.sh

rchikhi commented 3 years ago

Some stats: