Closed taltman closed 3 years ago
s3://serratus-rayan/master_table_assemblies/SRR11648360.fa
is the right assembly file as it has been filtered for CoV hits (using BGC). This assembly only has 1 small cov hit. The Checkv-filtered assembly was empty.
In gene_clusters.fa
(unfiltered for cov) there are many RdRP hits but since they're not on cov-filtered contigs, we discard them.
# Macro-domains - 122
# Peptidase_C30-domains - 2
# RdRP_1-domains - 1100
# Viral_helicase1-domains - 64
So bottom line: this dataset maybe has cov but certainly not all cov genes.
See issue #223.
When @asl runs Pfam on
SRR11648360.coronaspades.gene_clusters.fa
, he gets hundreds of hits.When I run on
s3://serratus-rayan/master_table_assemblies/SRR11648360.fa
, I do not get any hits.@asl, can you please post a S3 URI to the file you analyzed, or just upload it to this issue?
@rchikhi, can you please confirm that I'm looking at the right assembly file?