Open rchikhi opened 3 years ago
all RdRPs present in the above gene_clusters files: s3://serratus-public/assemblies/quenya/rdrps/
extracted using this script. In a nutshell: all tblastn
hits of quenya contigs (gene_clusters) to quenya.protref.aa
that have length above 550 100 (somewhat arbitrary, but most RdRps seem to be above 550) regardless of identity, then grouped by bedtools
to extract unique regions from contigs (because the same contig region may match to several hits from quenya.protref.aa
).
All the above RdRps in a single FASTA file: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_rdrps.fa
Diamond of all_rdrps.fa
file against nr: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_rdrps.fa.diamond_vs_nr.fmt6-custom
cmdline: \time diamond blastx --db ~/diamonddb/nr --query all_rdrps.fa -p 48 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qseq sseq
A different analysis direction:
all gene_clusters.fa files are concatenated here: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_gene_clusters.fa
Diamond blastx of this file against rdrp0_Q_D.fa: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_gene_clusters.fa.diamond_vs_rdrp0_q_d.fmt6
Pathracer-seq-fs --max-fs 0
applied to RdRP_[X].hmm
(with X=1,2,3,4,q) versus all_gene_clusters.fa
:
s3://serratus-public/assemblies/quenya/rdrps_analysis/pathracer_seq_fs/
Pathracer
applied to RdRP_q.hmm
and all assembly graphs:
s3://serratus-public/assemblies/quenya/rdrps_analysis/pathracer/
All AA-guided assemblies:
s3://serratus-public/assemblies/quenya/gene_clusters/
Those are all the coronaspades assemblies matching the input RdRp file (quenya.protref.aa
) Could be assembled: 452 out of 497 (list:s3://serratus-public/assemblies/quenya/rdrps_analysis/list_assembled_quenya.txt
)All coronaspades output files:
s3://serratus-public/assemblies/quenya/other/