Open ShailNair opened 6 months ago
Hi,
I followed the provided instructions and created the MMSeqs2 database using VMR_MSL38_v3 from ICTV. During the entire process, I did not receive any errors. However, when I execute the taxonomy assignment command I get the following error:
$mmseqs easy-taxonomy final.vcontigs.fixed.faa virus_tax_db ictv tmp \ > -e 1e-5 -s 6 --blacklist "" --tax-lineage 1 --threads 30 MMseqs Version: 13.45111 ORF filter 0 ORF filter e-value 100 ORF filter sensitivity 2 LCA mode 3 Majority threshold 0.5 Vote mode 1 LCA ranks ................ Database virus_tax_db needs header information
My mapping file looks like this:
$head -n10 nr.virus.accession2taxid.ictv 102L_A 965 103L_A 965 104L_A 965 107L_A 965 108L_A 965 109L_A 965 110L_A 965 111L_A 965 112L_A 965 113L_A 965
and tax-dump directory:
$cd ictv-taxdump $ ls -l -a | grep "^-" | awk '{print $9, $5}' delnodes.dmp 0 merged.dmp 0 names.dmp 743021 nodes.dmp 942866
Note that the delnodes.dmp and merged.dmp are empty. The content of names.dmp and nodes.dmp:
$ head -n10 names.dmp 1 | root | | scientific name | 2 | Hoswirudivirus MRV1 | | scientific name | 3 | Shomudavirus limadaptatum | | scientific name | 4 | Moovirus moo | | scientific name | 5 | Sclerotimonavirus betaclarireediae | | scientific name | 6 | Potato virus H | | scientific name | 7 | Rhopapillomavirus 1 | | scientific name | 8 | Monomorium pharaonis virus 1 | | scientific name | 9 | Aquamavirus A | | scientific name | 10 | Orthorubulavirus hominis | | scientific name | $ head -n10 nodes.dmp 1 | 1 | no rank | | 8 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | | 2 | 10641 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 3 | 3162 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 4 | 591 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 5 | 13564 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 6 | 2366 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 7 | 11378 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 8 | 12606 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 9 | 7806 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | | 10 | 7615 | species | XX | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
Also, can the following information be extracted to a tsv/csv file, with protein_id from nr.virus.faa.gz and their corresponding ICTV accession.
Protein_id | Realm | Subrealm | Kingdom | Subkingdom | Phylum | Subphylum | Class | Subclass | Order | Suborder | Family | Subfamily | Genus | Subgenus | Species -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Thank you
Hi,
I followed the provided instructions and created the MMSeqs2 database using VMR_MSL38_v3 from ICTV. During the entire process, I did not receive any errors. However, when I execute the taxonomy assignment command I get the following error:
My mapping file looks like this:
and tax-dump directory:
Note that the delnodes.dmp and merged.dmp are empty. The content of names.dmp and nodes.dmp:
Also, can the following information be extracted to a tsv/csv file, with protein_id from nr.virus.faa.gz and their corresponding ICTV accession.
Thank you