Closed rcedgar closed 3 years ago
on it
They're in s3://serratus-public/seq/cov5/annotations/
NC_007447.fa.toro5_cg.fa.darth.pfam.alignments.fasta
NC_022787.fa.toro5_cg.fa.darth.pfam.alignments.fasta
NC_034976.fa.toro5_cg.fa.darth.pfam.alignments.fasta
NC_046956.fa.toro5_cg.fa.darth.pfam.alignments.fasta
did the Genbank's also:
KM403390.fa.toro5_cg.fa.darth.pfam.alignments.fasta
LC088094.fa.toro5_cg.fa.darth.pfam.alignments.fasta
LC088095.fa.toro5_cg.fa.darth.pfam.alignments.fasta
LC483442.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MG957145.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MG957146.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MH603532.fa.toro5_cg.fa.darth.pfam.alignments.fasta
MN073058.fa.toro5_cg.fa.darth.pfam.alignments.fasta
Extra credit for the GBs 👍 MG957145 and MG957146 are empty, maybe check these in case symptom of annotation problem? Refseqs look good though, so looks like I'm set for making alignment and tree.
I re-ran MG957146 again and got indeed this error:
00:00 37Mb 0.1% Reading /serratax_ref/doms.fa 00:00 43Mb 100.0% Reading /serratax_ref/doms.fa
00:00 9.0Mb 0.1% Masking (fastamino) 00:00 9.0Mb 100.0% Masking (fastamino)
00:00 21Mb 0.1% Word stats 00:00 21Mb 100.0% Word stats
00:00 21Mb 100.0% Alloc rows
00:00 28Mb 0.1% Build index 00:01 28Mb 48.6% Build index 00:01 28Mb 100.0% Build index
00:01 57Mb CPU has 40 cores, defaulting to 10 threads
00:01 57Mb 0.1% Searching orfs2.fa, 0.0% matched 00:01 143Mb 100.0% Searching orfs2.fa, 10.0% matched
[..]
Running hmmsearch
Running hmmalign
Error: Sequence file raw/orfs.filtered.fa is empty or misformatted
@rchikhi MG957146.1 is only 7 kbp
The matches I'm seeing is to https://pfam.xfam.org/family/Spike_torovirin and some other Toro sequences
indeed! so not quite a complete genome.. well they do label it as "Bovine torovirus isolate BToV-HT2-TUR, complete genome."
Yeah. And they mark only 4 genes there...
Reopening to use as test-case for the new Nido-Pfam library that I created.
For tree-building and Fig. 1, we need PFAM alignments (1st priority RdRp for preprint) for these full-length Toros:
NC_007447 NC_022787 NC_034976 NC_046956