Open Yeahji9721 opened 2 months ago
nt/viral version is based on when it is being downloaded, not the kraken version. kraken will just pull the most recent one.
Did the kraken2-build --download-taxonomy script work correctly? Based on the commands, it should be able to build the taxonomy file. What files do you have in your taxonomy/ folder?
Yes I think it worked correctly but it halted during the process. I will just attach the job status below.
Downloading nucleotide gb accession to taxon map... done. Downloading nucleotide wgs accession to taxon map... done. Downloaded accession to taxon map(s) Downloading taxonomy tree data... done. Uncompressing taxonomy data... done. Untarring taxonomy tree data... done. Downloading nt database from server... done. Uncompressing nt database...done. Parsing nt FASTA file...done. Masking low-complexity regions of downloaded library... done. Step 1/2: Performing rsync file transfer of requested files Rsync file transfer complete. Step 2/2: Assigning taxonomic IDs to sequences All files processed, cleaning up extra sequence files... done, library complete. Masking low-complexity regions of downloaded library... done. Creating sequence ID to taxonomy ID map (step 1)... Found 112295042/112651381 targets, searched through 1026753790 accession IDs, search complete. lookup_accession_numbers: 356339/112651381 accession numbers remain unmapped, see unmapped.txt in DB directory Sequence ID to taxonomy ID map complete. [11m58.305s] Estimating required capacity (step 2)... Estimated hash table requirement: 953400694488 bytes Capacity estimation complete. [5h22m38.584s] Building database files (step 3)... Taxonomy parsed and converted. Failed attempt to allocate 953400694488bytes; you may not have enough free memory to build this database. Perhaps increasing the k-mer length, or reducing memory usage from other programs could help you build this database? build_db: unable to allocate hash table memory xargs: cat: terminated by signal 13
So I think it is because of lack of RAM I guess. So what sort of RAM should I request to download and build the database ?
Ah it stopped because you were unable to build the database . The taxonomy is not the issue here.
Do you need the full nt database? What kind of samples are you trying to run? The database size looks to be 953Gb
I dont need human or any other higher animals. I mainly need from insect, virus, fungi,bacteria and etc. But it is needed to be updated as much as it can be.
Is there anyway I can make database from NT but only with what I need then?
@Yeahji9721 probably you should curate the nt information once is downloaded, what I mean by that is to remove the higher eucariotes information and then build the database with only the information that you need.
Thank you for the comment. Could you specify it where I should delete from library/NT directory? Also, then you meant after downloading NT and viral, then delete higher eucariotes , then run the build command ?
@Yeahji9721 Hi!, that is exactly what I had in mind. Once all the information is download there should be one (or several) files that contains information about which genome accesion ID correspond to which specie. I have not personally download the nt databaes therefore don't known what it downloads. You can write me to cesar.ayala@ibt.unam.mx for a strigthfoward conversation
I've made customised database by using the code
kraken2-build --download-taxonomy --threads ${NSLOTS} --db kraken2_db2024 kraken2-build --download-library nt --threads ${NSLOTS} --db kraken2_db2024 kraken2-build --download-library viral --threads ${NSLOTS} --db kraken2_db2024
kraken2-build --build --threads ${NSLOTS} --db kraken2_db2024
But it didn't have taxo.k2d in directory, so how can I solve this problem?
Also, another thing is if I use kraken2.1.2 instead of 2.1.3, does it affect the version of NT or viral I will get to download for the database ? Our uni cluster is currently using 2.1.2 ( I can use updated one via adding the code apptainer exec kraken2_latest.sif, but I would prefer to use the one we have for stability.)