apcamargo / ictv-mmseqs2-protein-database

23 stars 5 forks source link

Taxonomy data not found #3

Closed btredcup closed 2 years ago

btredcup commented 2 years ago

Hi,

Thanks for great tool. I am having an issue with taxonkit, I have downgraded it to v.0.11.1.

This is the error I am receiving after running taxonkit create-taxdump:

[ERRO] taxonomy data not found, please download and uncompress ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz, and copy "names.dmp", "nodes.dmp", "delnodes.dmp", and "merged.dmp" to /home/users/.taxonkit

Any help would be greatly appreciated. Thanks

apcamargo commented 2 years ago

taxonkit needs you to have a taxdump in your home. Just download it from the link in this error message and copy the files to ~/.taxonkit.

btredcup commented 2 years ago

That worked, thank you. Is that the same as the next step in the tutorial?

# Download the NCBI taxdump
aria2c -x 4 "ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz"
mkdir ncbi-taxdump
tar zxfv taxdump.tar.gz -C ncbi-taxdump
rm taxdump.tar.gz

Or do I need to download it again and put it in ncbi-taxdump?

apcamargo commented 2 years ago

I think you can skip it, as long as you point to the right directory in this command:

awk '{print $2}' prot.accession2taxid.FULL \
    | sort -u \
    | taxonkit --data-dir ncbi-taxdump lineage \
    | rg "\tViruses;" \
    | awk '{print $1}' \
    > virus_taxid.list
btredcup commented 2 years ago

Thank you. Do I direct it to the ./taxonkit directory I downloaded the taxdump.tar.gz into initially? Or ictv-taxdump directory generated after running fix_taxdump.py?

From: Antônio Camargo @.> Sent: 10 August 2022 17:48 To: apcamargo/ictv-mmseqs2-protein-database @.> Cc: Newberry, Fiona @.>; Author @.> Subject: Re: [apcamargo/ictv-mmseqs2-protein-database] Taxonomy data not found (Issue #3)

I think you can skip it, as long as you point to the right directory in this command:

awk '{print $2}' prot.accession2taxid.FULL \

| sort -u \

| taxonkit --data-dir ncbi-taxdump lineage \

| rg "\tViruses;" \

| awk '{print $1}' \

> virus_taxid.list

- Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapcamargo%2Fictv-mmseqs2-protein-database%2Fissues%2F3%23issuecomment-1210982118&data=05%7C01%7Cfiona.newberry%40ntu.ac.uk%7Cff2749852b024ecfa9f208da7af02648%7C8acbc2c5c8ed42c78169ba438a0dbe2f%7C1%7C0%7C637957469093929095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ieBEWoHYkV7xoIMERSzWygzFGY%2FmZ1l%2F9VZwOVMIsRc%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAYAIRJHP2EO7VPXTIUYWCJLVYPMNTANCNFSM5577QCOQ&data=05%7C01%7Cfiona.newberry%40ntu.ac.uk%7Cff2749852b024ecfa9f208da7af02648%7C8acbc2c5c8ed42c78169ba438a0dbe2f%7C1%7C0%7C637957469093929095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ubDEbO6xPjZtEnrsKxfaZV%2BF3hnrSNJkoVgUjwQhjYU%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.**@.>>

This email has been scanned by BullGuard antivirus protection. For more info visit www.bullguard.comhttp://www.bullguard.com/tracking.aspx?affiliate=bullguard&buyaffiliate=smtp&url=/ DISCLAIMER: This email is intended solely for the addressee. It may contain private and confidential information. If you are not the intended addressee, please take no action based on it nor show a copy to anyone. In this case, please reply to this email to highlight the error. Opinions and information in this email that do not relate to the official business of Nottingham Trent University shall be understood as neither given nor endorsed by the University. Nottingham Trent University has taken steps to ensure that this email and any attachments are virus-free, but we do advise that the recipient should check that the email and its attachments are actually virus free. This is in keeping with good computing practice.

apcamargo commented 2 years ago

ncbi-taxdump is the one you download from NCBI. The one you generate is the ictv-taxdump

btredcup commented 2 years ago

Okay, thank you. All sorted now