We're trying to optimize alignments of oligos (~ 21 bp). We tried setting --sjdbOverhang and --genomeSAindexNbases using STAR 2.7.10a. We are trying to build the index using the NCBI human reference genome using the following files:
We ran the following three sets of values for --sjdbOverhang and --genomeSAindexNbases. All of these were run on a r6a.48xlarge instance. If we set --genomeSAindexNbases to anything larger than 16 it fails. Examples of the two failures are below - out of memory or a potential bug?
Does it make sense to increase both of these parameters to try to reduce the runtimes of alignments?
Feb 09 19:33:15 ..... started STAR run
Feb 09 19:33:15 ... starting to generate Genome files
Feb 09 19:33:56 ..... processing annotations GTF
!!!!! WARNING: --genomeSAindexNbases 29 is too large for the genome size=3298430636, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 14
Feb 09 19:34:34 ... starting to sort Suffix Array. This may take a long time...
Feb 09 19:35:31 ... sorting Suffix Array chunks and saving them to disk...
Feb 09 19:54:02 ... loading chunks from disk, packing SA...
Feb 09 19:55:25 ... finished generating suffix array
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Feb 09 16:10:32 ..... started STAR run
Feb 09 16:10:32 ... starting to generate Genome files
Feb 09 16:11:14 ..... processing annotations GTF
!!!!! WARNING: --genomeSAindexNbases 17 is too large for the genome size=3298430636, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 14
Feb 09 16:11:52 ... starting to sort Suffix Array. This may take a long time...
Feb 09 16:12:48 ... sorting Suffix Array chunks and saving them to disk...
Feb 09 16:30:12 ... loading chunks from disk, packing SA...
Feb 09 16:31:33 ... finished generating suffix array
Feb 09 16:31:33 ... generating Suffix Array index
Feb 09 16:38:00 ..... started STAR run
Feb 09 16:38:00 ... starting to generate Genome files
Feb 09 16:38:41 ..... processing annotations GTF
!!!!! WARNING: --genomeSAindexNbases 16 is too large for the genome size=3298430636, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 14
Feb 09 16:39:19 ... starting to sort Suffix Array. This may take a long time...
Feb 09 16:40:16 ... sorting Suffix Array chunks and saving them to disk...
Feb 09 16:58:43 ... loading chunks from disk, packing SA...
Feb 09 17:00:05 ... finished generating suffix array
Feb 09 17:00:05 ... generating Suffix Array index
Feb 09 17:22:38 ... completed Suffix Array index
Feb 09 17:22:39 ..... inserting junctions into the genome indices
Feb 09 17:25:54 ... writing Genome to disk ...
Feb 09 17:25:55 ... writing Suffix Array to disk ...
Feb 09 17:26:01 ... writing SAindex to disk
Feb 09 17:26:08 ..... finished successfully
We're trying to optimize alignments of oligos (~ 21 bp). We tried setting --sjdbOverhang and --genomeSAindexNbases using STAR 2.7.10a. We are trying to build the index using the NCBI human reference genome using the following files:
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.gtf.gz
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz
We ran the following three sets of values for --sjdbOverhang and --genomeSAindexNbases. All of these were run on a r6a.48xlarge instance. If we set --genomeSAindexNbases to anything larger than 16 it fails. Examples of the two failures are below - out of memory or a potential bug?
Does it make sense to increase both of these parameters to try to reduce the runtimes of alignments?
Feb 09 19:33:15 ..... started STAR run Feb 09 19:33:15 ... starting to generate Genome files Feb 09 19:33:56 ..... processing annotations GTF !!!!! WARNING: --genomeSAindexNbases 29 is too large for the genome size=3298430636, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 14 Feb 09 19:34:34 ... starting to sort Suffix Array. This may take a long time... Feb 09 19:35:31 ... sorting Suffix Array chunks and saving them to disk... Feb 09 19:54:02 ... loading chunks from disk, packing SA... Feb 09 19:55:25 ... finished generating suffix array terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
Feb 09 16:10:32 ..... started STAR run Feb 09 16:10:32 ... starting to generate Genome files Feb 09 16:11:14 ..... processing annotations GTF !!!!! WARNING: --genomeSAindexNbases 17 is too large for the genome size=3298430636, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 14 Feb 09 16:11:52 ... starting to sort Suffix Array. This may take a long time... Feb 09 16:12:48 ... sorting Suffix Array chunks and saving them to disk... Feb 09 16:30:12 ... loading chunks from disk, packing SA... Feb 09 16:31:33 ... finished generating suffix array Feb 09 16:31:33 ... generating Suffix Array index
BUG: next index is smaller than previous, EXITING
Feb 09 16:37:41 ...... FATAL ERROR, exiting
Feb 09 16:38:00 ..... started STAR run Feb 09 16:38:00 ... starting to generate Genome files Feb 09 16:38:41 ..... processing annotations GTF !!!!! WARNING: --genomeSAindexNbases 16 is too large for the genome size=3298430636, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 14 Feb 09 16:39:19 ... starting to sort Suffix Array. This may take a long time... Feb 09 16:40:16 ... sorting Suffix Array chunks and saving them to disk... Feb 09 16:58:43 ... loading chunks from disk, packing SA... Feb 09 17:00:05 ... finished generating suffix array Feb 09 17:00:05 ... generating Suffix Array index Feb 09 17:22:38 ... completed Suffix Array index Feb 09 17:22:39 ..... inserting junctions into the genome indices Feb 09 17:25:54 ... writing Genome to disk ... Feb 09 17:25:55 ... writing Suffix Array to disk ... Feb 09 17:26:01 ... writing SAindex to disk Feb 09 17:26:08 ..... finished successfully