Closed Aline-Git closed 5 months ago
Hi @Aline-Git , Thank you for using the workflow! To use the workflow with minimap2 and a custom database you need the next files: FASTA file:
> seq1
AAAAAA
> seq2
AAAAAA
The ref2taxid file:
seq1\564117
seq2\t1454219
where 564117 is the NCBI taxid for Marinobacter antarcticus and 1454219 is the NCBI taxid for Pseudomonas aeruginosa 059A
Alternatively you can use different taxids but in that case you also need to provide a custom taxonomy database. You can take a look here to know more about how to use custom databases. Please let me know if this helps with your problem!
Hi @Aline-Git , were you able to run the workflow with it? If that is the case, please close the issue
Hi @Aline-Git , Hope you were able to run the workflow. I'm going to close the issue as there are no news, but please feel free to open a new one if you find something else. Thank you for using the workflow!
Thanks for your help. Indeed it works, the pipeline gets complete once this ref2taxid_file provided.
Yet, what I really would like is to keep the taxonomy of the pr2 database, which is different from the ncbi. I don't know if it will be possible.
I will try a bit by myself and open a new issue if I cannot do it.
Thank you for the workflow :) !
Thank you! If taxids are different then you need a taxonomy database
Hi there, if I understood this correctly then most of the confusion relating to mapping reads with minimap2 to a custom (non NCBI) database and taxonomy, stems from the need to have a taxonomy database. Usually, when mapping with minimap2 only two files are needed (in addition to unknown reads), the .fasta file with reference sequences and the .tsv file that contains the taxonomy string with identical sequence IDs. But in the case of this wf-metagenomic workflow we seem to need a third entity, the 'taxonomy database’, which seems to be a bunch of .dmp files in a folder. This is not well explained here I feel. If I have my own database ( .fasta + .tsv files) how do I create this needed taxdump folder and why is it even required? Thanks!
Operating System
Other Linux (please specify below)
Other Linux
ubuntu 18.04
Workflow Version
v2.9.3-g6636bc9
Workflow Execution
Command line (Cluster)
Other workflow execution
On a virtual machine via comand line
EPI2ME Version
No response
CLI command run
nextflow run epi2me-labs/wf-metagenomics \ --fastq /data/VITAE/WP3_20240205/input_folder \ --sample_sheet /data/VITAE/WP3_20240205/sample_sheet.csv \ --classifier minimap2 \ --reference /data/reference/database/PR2_db/pr2_version_5_0_0_SSU_dada2.mmi \ --ref2taxid /data/reference/database/PR2_db/ref2taxid_PR2_rapide.tsv
Workflow Execution - CLI Execution Profile
standard (default)
What happened?
Hello, Thanks for this workflow, I tested it with the default parameters and it worked fine.
I would like to use the workflow with the PR2 database (18S). First I wanted to use the kraken2 option, but building the custom database for kraken2 seems a bit complicated since with the nodes.dmp file to provide.
The database is a fasta file with this format :
I tried the minimap2 option. I first did not provide a ref2taxid file, but it raised an error. I created an artificial ref2taxid file with this format :
taxonomy1\ttaxonomy1 taxonomy2\ttaxonomy2 etc.
But now I have this error
Relevant log output
Application activity log entry
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
No response