BUStools / bustools

Tools for working with BUS files
https://bustools.github.io/
BSD 2-Clause "Simplified" License
91 stars 23 forks source link

'std::bad_alloc' on EC2 #79

Open GabyBG opened 2 years ago

GabyBG commented 2 years ago

Hello,

I find KB python super useful and efficient. Recently I have moved to work EC2 instances and to test that everything was good I decided to re-run a sample in there using KB, the same version and index. I was previously working in a dragen server and sometimes my kb jobs would fail but sometimes the same sample would finish. Because of that I was eager to see if moving to EC2 would be better, but now I cannot run kb in my EC2 instance. The installation seems to be fine.

This is my command (notice I am trying to run the workflow lamanno)

IDX=/home/ubuntu/kb_lamanno_index_CAR/index.idx 
T2G=/home/ubuntu/kb_lamanno_index_CAR/t2g.txt 
SPL=/home/ubuntu/kb_lamanno_index_CAR/spliced_t2c.txt
UNSPL=/home/ubuntu/kb_lamanno_index_CAR/unspliced_t2c.txt

kb count --loom -i $IDX -g $T2G -x 10xv2 -o $sample -c1 $SPL -c2 $UNSPL --workflow lamanno --filter bustools \
sample_01_L001_R1.fastq.gz sample_01_L001_R2.fastq.gz sample_01_L002_R1.fastq.gz sample_01_L002_R2.fastq.gz

It runs for about 5 min and then I get:

[2022-06-28 22:07:53,285]    INFO [count_lamanno] Using index /home/ubuntu/kb_lamanno_index_CAR/index.idx to generate BUS file to SM135_batch1_01_S1 from
[2022-06-28 22:07:53,285]    INFO [count_lamanno]         /home/ubuntu/BioInfo-SM-135/batch1/SM135_batch1_01_S1_L001_R1_001.fastq.gz
[2022-06-28 22:07:53,285]    INFO [count_lamanno]         /home/ubuntu/BioInfo-SM-135/batch1/SM135_batch1_01_S1_L001_R2_001.fastq.gz
[2022-06-28 22:07:53,285]    INFO [count_lamanno]         /home/ubuntu/BioInfo-SM-135/batch1/SM135_batch1_01_S1_L002_R1_001.fastq.gz
[2022-06-28 22:07:53,285]    INFO [count_lamanno]         /home/ubuntu/BioInfo-SM-135/batch1/SM135_batch1_01_S1_L002_R2_001.fastq.gz
[2022-06-28 22:10:41,559]   ERROR [count_lamanno] 
[bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[index] k-mer length: 31
[index] number of targets: 1,455,417
[index] number of k-mers: 1,578,640,741
[index] number of equivalence classes: 128
terminate called after throwing an instance of 'std::bad_alloc'
what():  std::bad_alloc
[2022-06-28 22:10:41,560]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/kb_python/main.py", line 1301, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/kb_python/main.py", line 482, in parse_count
    count_velocity(
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/kb_python/count.py", line 1915, in count_velocity
    bus_result = kallisto_bus(
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/kb_python/validate.py", line 116, in inner
    results = func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/kb_python/count.py", line 190, in kallisto_bus
    run_executable(command)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))

I have tried to increase the memory up to 100 and my instance (r4.8xlarge) should have no issue. I tried looking into other issues like #49 but that seems to be more 'platform based'. When using cell ranger this sample detected 6798 cells after filtering.

Maybe I am missing something obvious, thank you so much for your support.

Yenaled commented 2 years ago

Something seems wrong with the index. Try re-creating the index (and make sure the FASTA file from which you're creating the index is not corrupted or anything).

Your output states that there are 1,455,417 targets (i.e. transcripts) but there are only 128 equivalence classes. This should not be possible.