Open dhoogest opened 6 months ago
For now, string based bespoke taxonomies "UW###" will not work:
% makeblastdb -dbtype nucl -in seqs.fa -out blast -parse_seqids -taxid_map seqmap.txt
Building a new DB, current time: 03/13/2024 15:20:54
New DB name: /blast
New DB title: output/seqs.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
Error: NCBI C++ Exception:
T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_880026_130.14.18.128_9008__PrepareRelease_Linux64-Centos_1697736677/c++/compilers/unix/../../src/corelib/ncbistr.cpp", line 862: Error: (CStringException::eConvert) ncbi::NStr::StringToInt8() - Cannot convert string 'UW123' to Int8 (m_Pos = 0)
But I will write an email to NCBI arguing tax_ids are not Int8s and see if they will change their dtype rule
Hi,
Thanks for following up.
Unfortunately, this operation is based on NCBI taxonomy database and how species are ID'ed. It also involves our internal workflow, so we cannot entertain this request.
Your understanding over this will be appreciated.
Regards,
Tao Tao, PhD
NCBI User Services
[https://go.usa.gov/x647S](https://urldefense.com/v3/__https://go.usa.gov/x647S__;!!K-Hz7m0Vt54!h8HZ1wIAeUMqTNcKvfn0qBHmJAcDCR6Hy49azKKshYBVwHruWdlamlWVA3GXB4BbSyU-vKeO5GMsEJWNLhk251no$)
------------------- Original Message -------------------
From: Chris Rosenthal <crosenth@uw.edu>;
Received: Thu Mar 14 2024 13:49:48 GMT-0400 (Eastern Daylight Time)
To: nlm-support@nlm.nih.gov <nlm-support@nlm.nih.gov>; NLM Support <nlm-support@nlm.nih.gov>; Triage Team <nlm-support@nlm.nih.gov>;
Subject: [EXTERNAL] Re: case #CAS-1281624-K2J1F7: makeblastdb requires tax_ids to be Int8 TRACKING:000412000016044
Hi Tao Tao,
Can I make a request for BLAST+ tools to support non-numeric, string taxonomy identifiers? This would allow users to utilize Blast tools using custom taxonomy identifiers like 'UW123". Please consider, taxonomy identifiers are used for identification purposes and are not meant to reflect numerical values, such as age, weight or quantity.
Thanks
Snappy response, if a bummer. I think there are still use cases for this taxtastic functionality, for databases which do not include non-NCBI taxa (such as ya16sdb).
I suspect they want tax_id numerical for db performance purposes instead of adding another unique indexing column to their internal db schema
In the BLAST+ manual, there are notes about support for exending the
--{negative}-taxids
CLI prompts to allow for filtering by non-leaf tax nodes. Support for this functionality appears to require a file calledtaxonomy4blast.sqlite3
alongside the blast datatabase binaries. It would be great if taxtastic could be extended to leverage logic for defining taxonomic lineages while exporting this specific shape, in order to facilitate builds of custom databases with bespoke taxonomies./cc @nhoffman @crosenth