Open mthang opened 1 month ago
Hi @mthang
Sorry for not responding earlier. I was on vacation.
Generally, you can use any kind of reference sequence data as long as you can provide a decent taxonomy. The only thing you need to do is downloading the fasta files and store them in a dedicated directory and provide the correct file name in the metadata file (Taxor does not download the files via ftp itself, but uses the filename from the ftp path to identify the correct file in the give directory). Besides that, the metadata file has to have the correct format as described here. That means you also need to provide taxonomic information for each sequence.
Great tool ! I wonder if taxor can be used to index NCBI nt sequences ? Based on the documentation, taxor build uses the ftp link in the file (third column) to download the reference sequence from NCBI taxonomy. What I am trying to find out if the local NCBI nt sequences can be indexed by taxor without the download bit. This can be very useful for some taxonomy that only have nucleotide sequences (not all the taxonomy has whole genome reference data).