DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
710 stars 271 forks source link

Kraken2 hanging during database generation #481

Open NBaileyNCL opened 3 years ago

NBaileyNCL commented 3 years ago

Hi,

I'm having issues generating a standard nr kraken2 database + a few local fasta files.

I have generated a database on this system before without issues, so presumably it may result from some change to the system, but I'm unsure what it could be.

Here is my script for database generation:

!/bin/bash

SBATCH --mem=500G

SBATCH --mail-type=FAIL,TIME_LIMIT

SBATCH -c 11

SBATCH -o logs/Macaque_map_Kraken_assembled_Parab_transcripts_15-7-21.out

SBATCH -A rhrlgtege

SBATCH -p bigmem

SBATCH -t 05-00:00

KRAKEN_DIR=/mnt/nfs/home/b7040535/Software/KRAKEN_2 KRAKEN_DB=/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21 in=/nobackup/b7040535/Macaque_Transcriptome_analysis/Parab_assembly/All_assemblies/Parab.fasta outdir=/nobackup/b7040535/Macaque_Transcriptome_analysis/Parab_assembly/All_assemblies/kraken TRICH_FASTA_DIR=/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20

Construct Kraken nucleotide database from NCBI nr and trichonand RNASeq contigs

$KRAKEN_DIR/kraken2-build --download-taxonomy -db $KRAKEN_DB --threads 11 $KRAKEN_DIR/kraken2-build --download-library nt --db $KRAKEN_DB --threads 11 find $TRICH_FASTA_DIR -name '*.fna' -print0 | xargs -0 -I{} -n1 $KRAKEN_DIR/kraken2-build --add-to-library {} --db $KRAKEN_DB $KRAKEN_DIR/kraken2-build --build --db $KRAKEN_DB --threads 11 $KRAKEN_DIR/kraken2-build --clean -db $KRAKEN_DB --threads 11

This is the output from kraken2:

[b7040535@login02 ~]$ cat logs/Macaque_map_Kraken_assembled_Parab_transcripts_9-7-21.out Downloading nucleotide gb accession to taxon map... done. Downloading nucleotide wgs accession to taxon map... done. Downloaded accession to taxon map(s) Downloading taxonomy tree data... done. Uncompressing taxonomy data... done. Untarring taxonomy tree data... done. Downloading nt database from server... done. Uncompressing nt database...done. Parsing nt FASTA file...done. Masking low-complexity regions of downloaded library... done. Masking low-complexity regions of new file... done. Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/TriBatrachorum_krakenFormat.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21) Masking low-complexity regions of new file... done. Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/Tetgallinarum_krakenFormat.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21) Masking low-complexity regions of new file... done. Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/Phominis_krakenInput.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21) Creating sequence ID to taxonomy ID map (step 1)... Found 78393779/78505519 targets, searched through 812094487 accession IDs, search complete. lookup_accession_numbers: 111740/78505519 accession numbers remain unmapped, see unmapped.txt in DB directory Sequence ID to taxonomy ID map complete. [8m19.190s] Estimating required capacity (step 2)... Estimated hash table requirement: 288467397484 bytes Capacity estimation complete. [1h14m50.082s] Building database files (step 3)... Taxonomy parsed and converted. CHT created with 22 bits reserved for taxid. slurmstepd: error: JOB 19971533 ON mn01 CANCELLED AT 2021-07-14T11:23:48 DUE TO TIME LIMIT

The script seems to hang for a long time after the "CHT created with 22 bits reserved for taxid." step, as I checked several times and it was stuck at that stage for ~5 days.

The resulting database is not complete as it lacks the taxo.k2d file

Is the script progressing, and I just need to give it more time? Is there an issue which is causing it to hang? I wanted to htop to check the activity of the node running the process, but our system doesn't allow us to log in to specific nodes

Any help would be much appreciated

NBaileyNCL commented 3 years ago

bumping for attention

NBaileyNCL commented 3 years ago

bumping for attention