Open rchikhi opened 3 years ago
Some stats:
number of macro RdRP+ contigs: 6,822,262 total size: 5,828,043,099 bp longest macro RdRP+ contig: 1,086,412 bp N50: 1,870 bp
number of micro RdRP+ contigs: 4,631,850 total size : 2,158,973,751 bp longest micro RdRP+ contig: 16,630 bp N50 : 622 bp
This issues describes the procedure to search all of our contigs against RdRP and presents results. (Slack thread: https://hackseq-rna.slack.com/archives/C012H9SDQCA/p1615948152031200)
Input: FASTA files of contigs, either assembled using micro (all SRA
.pro
DIAMOND hits assembled withrnaviralspades
) or macro (all froms3://lovelywater/assembly/contigs/
, i.e. all CoV + dicistro + quenya + satellite + 1k random subset assembled using eithercoronaSPAdes
orrnaviralspades
).Output: FASTA of all the contigs that hit RdRP either with HMM and/or palmscan, i.e. the RdRP+ contigs:
s3://serratus-rayan/pro-assembly/rdrpplus.micro.fa
s3://serratus-rayan/pro-assembly/rdrpplus.macro.fa
total size: 8.2 GBhmmsearch
was run using an exhaustive collection of RdRP HMMs: https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/RdRP_all.v2.hmmalignments were made using this script: https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/align_hmm_to_contigs.sh