I'm having issues generating a standard nr kraken2 database + a few local fasta files.
I have generated a database on this system before without issues, so presumably it may result from some change to the system, but I'm unsure what it could be.
[b7040535@login02 ~]$ cat logs/Macaque_map_Kraken_assembled_Parab_transcripts_9-7-21.out
Downloading nucleotide gb accession to taxon map... done.
Downloading nucleotide wgs accession to taxon map... done.
Downloaded accession to taxon map(s)
Downloading taxonomy tree data... done.
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.
Downloading nt database from server... done.
Uncompressing nt database...done.
Parsing nt FASTA file...done.
Masking low-complexity regions of downloaded library... done.
Masking low-complexity regions of new file... done.
Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/TriBatrachorum_krakenFormat.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21)
Masking low-complexity regions of new file... done.
Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/Tetgallinarum_krakenFormat.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21)
Masking low-complexity regions of new file... done.
Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/Phominis_krakenInput.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21)
Creating sequence ID to taxonomy ID map (step 1)...
Found 78393779/78505519 targets, searched through 812094487 accession IDs, search complete.
lookup_accession_numbers: 111740/78505519 accession numbers remain unmapped, see unmapped.txt in DB directory
Sequence ID to taxonomy ID map complete. [8m19.190s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 288467397484 bytes
Capacity estimation complete. [1h14m50.082s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 22 bits reserved for taxid.
slurmstepd: error: JOB 19971533 ON mn01 CANCELLED AT 2021-07-14T11:23:48 DUE TO TIME LIMIT
The script seems to hang for a long time after the "CHT created with 22 bits reserved for taxid." step, as I checked several times and it was stuck at that stage for ~5 days.
The resulting database is not complete as it lacks the taxo.k2d file
Is the script progressing, and I just need to give it more time? Is there an issue which is causing it to hang? I wanted to htop to check the activity of the node running the process, but our system doesn't allow us to log in to specific nodes
Hi,
I'm having issues generating a standard nr kraken2 database + a few local fasta files.
I have generated a database on this system before without issues, so presumably it may result from some change to the system, but I'm unsure what it could be.
Here is my script for database generation:
!/bin/bash
SBATCH --mem=500G
SBATCH --mail-type=FAIL,TIME_LIMIT
SBATCH -c 11
SBATCH -o logs/Macaque_map_Kraken_assembled_Parab_transcripts_15-7-21.out
SBATCH -A rhrlgtege
SBATCH -p bigmem
SBATCH -t 05-00:00
KRAKEN_DIR=/mnt/nfs/home/b7040535/Software/KRAKEN_2 KRAKEN_DB=/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21 in=/nobackup/b7040535/Macaque_Transcriptome_analysis/Parab_assembly/All_assemblies/Parab.fasta outdir=/nobackup/b7040535/Macaque_Transcriptome_analysis/Parab_assembly/All_assemblies/kraken TRICH_FASTA_DIR=/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20
Construct Kraken nucleotide database from NCBI nr and trichonand RNASeq contigs
$KRAKEN_DIR/kraken2-build --download-taxonomy -db $KRAKEN_DB --threads 11 $KRAKEN_DIR/kraken2-build --download-library nt --db $KRAKEN_DB --threads 11 find $TRICH_FASTA_DIR -name '*.fna' -print0 | xargs -0 -I{} -n1 $KRAKEN_DIR/kraken2-build --add-to-library {} --db $KRAKEN_DB $KRAKEN_DIR/kraken2-build --build --db $KRAKEN_DB --threads 11 $KRAKEN_DIR/kraken2-build --clean -db $KRAKEN_DB --threads 11
This is the output from kraken2:
[b7040535@login02 ~]$ cat logs/Macaque_map_Kraken_assembled_Parab_transcripts_9-7-21.out Downloading nucleotide gb accession to taxon map... done. Downloading nucleotide wgs accession to taxon map... done. Downloaded accession to taxon map(s) Downloading taxonomy tree data... done. Uncompressing taxonomy data... done. Untarring taxonomy tree data... done. Downloading nt database from server... done. Uncompressing nt database...done. Parsing nt FASTA file...done. Masking low-complexity regions of downloaded library... done. Masking low-complexity regions of new file... done. Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/TriBatrachorum_krakenFormat.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21) Masking low-complexity regions of new file... done. Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/Tetgallinarum_krakenFormat.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21) Masking low-complexity regions of new file... done. Added "/nobackup/b7040535/Macaque_Transcriptome_analysis/SequenceData/Kraken_data_21-7-20/Phominis_krakenInput.fna" to library (/nobackup/b7040535/Macaque_Transcriptome_analysis/Kranken_nt_database_2-7-21) Creating sequence ID to taxonomy ID map (step 1)... Found 78393779/78505519 targets, searched through 812094487 accession IDs, search complete. lookup_accession_numbers: 111740/78505519 accession numbers remain unmapped, see unmapped.txt in DB directory Sequence ID to taxonomy ID map complete. [8m19.190s] Estimating required capacity (step 2)... Estimated hash table requirement: 288467397484 bytes Capacity estimation complete. [1h14m50.082s] Building database files (step 3)... Taxonomy parsed and converted. CHT created with 22 bits reserved for taxid. slurmstepd: error: JOB 19971533 ON mn01 CANCELLED AT 2021-07-14T11:23:48 DUE TO TIME LIMIT
The script seems to hang for a long time after the "CHT created with 22 bits reserved for taxid." step, as I checked several times and it was stuck at that stage for ~5 days.
The resulting database is not complete as it lacks the taxo.k2d file
Is the script progressing, and I just need to give it more time? Is there an issue which is causing it to hang? I wanted to htop to check the activity of the node running the process, but our system doesn't allow us to log in to specific nodes
Any help would be much appreciated