Greetings, trying to build default genomic data-bases used to run kraken2 on an HPC cluster. The database location is set for my scratch directory, which i believe has 15TB of space available. Regardless, below is the command i submitted (using qsub).
The command starts running, and i can see inside the 'kraken_default_db' that there are files being made. However, the output log has an error that i'm not sure how to deal with ('cat: write error: Broken pipe', towards the end of the). Below is the log file (krakenuniq_setup.log). Any suggestions are welcome.
Begin PBS Prologue Sat Sep 18 18:58:33 EDT 2021
Job ID: 2890599.sched-torque.pace.gatech.edu
User ID: abertagnolli3
Job name: conda
Queue: inferno
End PBS Prologue Sat Sep 18 18:58:33 EDT 2021
---------------------------------------
Downloading nucleotide gb accession to taxon map... done.
Downloading nucleotide wgs accession to taxon map... done.
Downloaded accession to taxon map(s)
Downloading taxonomy tree data... done.
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Downloading plasmid files from FTP... done.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
All files processed, cleaning up extra sequence files... done, library complete.
Downloading UniVec_Core data from server... done.
Adding taxonomy ID of 28384 to all sequences... done.
Masking low-complexity regions of downloaded library... done.
Creating sequence ID to taxonomy ID map (step 1)...
Found 37324/37325 targets, searched through 797589570 accession IDs, search complete.
lookup_accession_numbers: 1/37325 accession numbers remain unmapped, see unmapped.txt in DB directory
Sequence ID to taxonomy ID map complete. [3h53m14.210s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 56759007816 bytes
Capacity estimation complete. [1h28m43.527s]
Building database files (step 3)...
Taxonomy parsed and converted.
cat: write error: Broken pipe
/storage/home/hcoda1/7/abertagnolli3/d-bios-fstewart7/rich_project_pb1/miniconda3/libexec/build_kraken2_db.sh: line 143: 445883 Done list_sequence_files
445884 Exit 123 | xargs -0 cat
445885 Killed | build_db -k $KRAKEN2_KMER_LEN -l $KRAKEN2_MINIMIZER_LEN -S $KRAKEN2_SEED_TEMPLATE $KRAKEN2XFLAG -H hash.k2d.tmp -t taxo.k2d.tmp -o opts.k2d.tmp -n taxonomy/ -m $seqid2taxid_map_file -c $required_capacity -p $KRAKEN2_THREAD_CT $max_db_flag -B $KRAKEN2_BLOCK_SIZE -b $KRAKEN2_SUBBLOCK_SIZE -r $KRAKEN2_MIN_TAXID_BITS $fast_build_flag
---------------------------------------
Begin PBS Epilogue Sun Sep 19 09:38:22 EDT 2021
Job ID: 2890599.sched-torque.pace.gatech.edu
User ID: abertagnolli3
Job name: conda
Resources: nodes=1:ppn=20,mem=20gb,walltime=90:00:00,neednodes=1:ppn=20
Rsrc Used: cput=08:22:31,vmem=22970428kb,walltime=14:39:49,mem=2138132kb,energy_used=0
Queue: inferno
Nodes:
atl1-1-02-014-36-r.pace.gatech.edu
End PBS Epilogue Sun Sep 19 09:38:22 EDT 2021
I think the problem is you don't have enough memory to run the job. Estimated hash table requirement: 56759007816 bytes means it requires 56GB memory but you only allocate 20GB. Try to change mem=20gb to mem=60gb.
Greetings, trying to build default genomic data-bases used to run kraken2 on an HPC cluster. The database location is set for my scratch directory, which i believe has 15TB of space available. Regardless, below is the command i submitted (using qsub).
The command starts running, and i can see inside the 'kraken_default_db' that there are files being made. However, the output log has an error that i'm not sure how to deal with ('cat: write error: Broken pipe', towards the end of the). Below is the log file (krakenuniq_setup.log). Any suggestions are welcome.