ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
250 stars 32 forks source link

PFAM alignments for Toro outgroup #215

Closed rcedgar closed 3 years ago

rcedgar commented 3 years ago

For tree-building and Fig. 1, we need PFAM alignments (1st priority RdRp for preprint) for these full-length Toros:

NC_007447 NC_022787 NC_034976 NC_046956

rchikhi commented 3 years ago

on it

rchikhi commented 3 years ago

They're in s3://serratus-public/seq/cov5/annotations/

NC_007447.fa.toro5_cg.fa.darth.pfam.alignments.fasta
NC_022787.fa.toro5_cg.fa.darth.pfam.alignments.fasta
NC_034976.fa.toro5_cg.fa.darth.pfam.alignments.fasta
NC_046956.fa.toro5_cg.fa.darth.pfam.alignments.fasta

did the Genbank's also:

KM403390.fa.toro5_cg.fa.darth.pfam.alignments.fasta
LC088094.fa.toro5_cg.fa.darth.pfam.alignments.fasta
LC088095.fa.toro5_cg.fa.darth.pfam.alignments.fasta
LC483442.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MG957145.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MG957146.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MH603532.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MN073058.fa.toro5_cg.fa.darth.pfam.alignments.fasta
rcedgar commented 3 years ago

Extra credit for the GBs 👍 MG957145 and MG957146 are empty, maybe check these in case symptom of annotation problem? Refseqs look good though, so looks like I'm set for making alignment and tree.

rchikhi commented 3 years ago

I re-ran MG957146 again and got indeed this error:

00:00 37Mb 0.1% Reading /serratax_ref/doms.fa 00:00 43Mb 100.0% Reading /serratax_ref/doms.fa
00:00 9.0Mb 0.1% Masking (fastamino) 00:00 9.0Mb 100.0% Masking (fastamino)
00:00 21Mb 0.1% Word stats 00:00 21Mb 100.0% Word stats
00:00 21Mb 100.0% Alloc rows
00:00 28Mb 0.1% Build index 00:01 28Mb 48.6% Build index 00:01 28Mb 100.0% Build index
00:01 57Mb CPU has 40 cores, defaulting to 10 threads
00:01 57Mb 0.1% Searching orfs2.fa, 0.0% matched 00:01 143Mb 100.0% Searching orfs2.fa, 10.0% matched
[..]
Running hmmsearch
Running hmmalign
Error: Sequence file raw/orfs.filtered.fa is empty or misformatted
asl commented 3 years ago

@rchikhi MG957146.1 is only 7 kbp

asl commented 3 years ago

The matches I'm seeing is to https://pfam.xfam.org/family/Spike_torovirin and some other Toro sequences

rchikhi commented 3 years ago

indeed! so not quite a complete genome.. well they do label it as "Bovine torovirus isolate BToV-HT2-TUR, complete genome."

asl commented 3 years ago

Yeah. And they mark only 4 genes there...

taltman commented 3 years ago

Reopening to use as test-case for the new Nido-Pfam library that I created.