ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
250 stars 32 forks source link

Quenya assemblies #238

Open rchikhi opened 3 years ago

rchikhi commented 3 years ago

All AA-guided assemblies: s3://serratus-public/assemblies/quenya/gene_clusters/ Those are all the coronaspades assemblies matching the input RdRp file (quenya.protref.aa) Could be assembled: 452 out of 497 (list: s3://serratus-public/assemblies/quenya/rdrps_analysis/list_assembled_quenya.txt)

All coronaspades output files: s3://serratus-public/assemblies/quenya/other/

rchikhi commented 3 years ago

all RdRPs present in the above gene_clusters files: s3://serratus-public/assemblies/quenya/rdrps/

extracted using this script. In a nutshell: all tblastn hits of quenya contigs (gene_clusters) to quenya.protref.aa that have length above 550 100 (somewhat arbitrary, but most RdRps seem to be above 550) regardless of identity, then grouped by bedtools to extract unique regions from contigs (because the same contig region may match to several hits from quenya.protref.aa).

rchikhi commented 3 years ago

All the above RdRps in a single FASTA file: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_rdrps.fa

Diamond of all_rdrps.fa file against nr: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_rdrps.fa.diamond_vs_nr.fmt6-custom cmdline: \time diamond blastx --db ~/diamonddb/nr --query all_rdrps.fa -p 48 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qseq sseq

rchikhi commented 3 years ago

A different analysis direction:

all gene_clusters.fa files are concatenated here: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_gene_clusters.fa

Diamond blastx of this file against rdrp0_Q_D.fa: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_gene_clusters.fa.diamond_vs_rdrp0_q_d.fmt6

rchikhi commented 3 years ago

Pathracer-seq-fs --max-fs 0 applied to RdRP_[X].hmm (with X=1,2,3,4,q) versus all_gene_clusters.fa:

s3://serratus-public/assemblies/quenya/rdrps_analysis/pathracer_seq_fs/

rchikhi commented 3 years ago

Pathracer applied to RdRP_q.hmm and all assembly graphs:

s3://serratus-public/assemblies/quenya/rdrps_analysis/pathracer/