Closed slambrechts closed 1 year ago
Hello @slambrechts,
Thank you for your query.
If you are using the function db_download
to download data from BOLD and NCBI, there is no need to use db_import
, as CRABS does this automatically. If, on the other hand, you want to import your own barcodes, you can tell CRABS where to find the accession number or species info in your sequence headers. For example, let's say that your fasta file is structured as below:
>Homo sapiens; A BUNCH OF METADATA
ACGT
You can tell CRABS that the species info -s species
is found by using the delimiter -d ;
. The species or accession info needs to be placed before the delimiter. Metadata needs to be removed for CRABS to work, as CRABS uses the full header to determine taxonomy at a later stage. If your data is structured in a different way, where species or accession info is not placed before the delimiter, please let me know and I'll add in a functionality where you can specify where CRABS can find this info in your header. Unfortunately, this is currently not feasible in CRABS.
I hope this answers your question, but please let me know if something is not clear or if I have misinterpreted your query.
Best regards, Gert-Jan
Dear Gert-Jan,
Ok, thank you for the info. For now we want to create a reference database using the standard NCBI and/or EMBL databases only. So if I understand correctly, we can go directly from db_download
to db_merge
, to merge the downloaded NCBI and EMBL databases? Fyi we are using primers that target mitochondrial 16S genes, one set targetting Collembola, the other targetting Oligochaeta.
Kind regards, Sam
Dear @slambrechts,
Yes, please move straight from db_download
to db_merge
.
Thanks, Gert-Jan
Hi,
I read:
I assume
--delim '_'
is not ideal then, since when accession numbers are unavailable, CRABS uses unique sequence IDs using 'CRABS_[num]:species_name', and thus an underscore?In the first scenario I assume there is no problem, since there are only NCBI accession numbers as headers?
Kind regards, Sam