Closed MortenEneberg closed 1 year ago
Dear Morten,
Thanks for your kind words, we are really happy that you find FlexTaxD useful, in particular with helping out implementing further options for database resources.
To your question. I do think that the function requires the file to be named "accession2taxid.gz"
if not annotation_file.endswith("accession2taxid.gz"):
However it is just the correct ending that is requested, so I would suggest to rename your file complete_w_eupath_univec.accession2taxid.gz
and it should work fine. The reason I use this code is because the read function is naive and would crash or make logic errors unless the correct file is supplied.
The read function of the accession2taxid file expect annotations to be matching the header of each sequence inside the fasta files. If this is not the case for eupathDB data and you do have annotations of "filename" to taxid instead, it is possible to use the regular --genomeid2taxid function (without --tt NCBI). The input file must then contain filename\ttaxid\n
see specification in genome2taxid format
Kind regards, David
Dear flextaxd team,
Thanks for an awesome tool! I use it to combine NCBI with GTDB taxonomy for kraken2 pathogen classification. I have recently been trying to implement the EuPathDB (http://ccb.jhu.edu/data/eupathDB/) to get clean eukaryotic genomes in the database. I tried just adding the genomes to my NCBI genomes path hoping that all would be annotated in the database. However, it seems that there has been some changes to the fasta headers causing problems when trying this since they are not all recognized and printed to the .flextaxdNotAdded file. To circumvent this I modified the seqid2taxid file (downloaded with the EuPathDB) to look like the accession2taxid files from NCBI (modified file attached: reduced_seqid2taxid_duplicate_no_univec.txt.gz) and concatenated this file to the other NCBI accession2taxid files, but somehow flextaxd recognizes that this is not the original one and throws an error message:
Code to build the NCBI database:
Do you have suggestions on how to solve this?
Kind regards, Morten