genome / pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
163 stars 89 forks source link

Pindel memory usage #24

Open walaj opened 8 years ago

walaj commented 8 years ago

Hello,

I am trying to call indels on a paired tumor/normal BAM file, and am running into very high memory usage issues (currently at 46Gb). I am wondering if this is a memory leak, or if there is some way I could tune the parameters to avoid this. I am running as follows:

python MakePinDelConfig.py -B <tum.bam> -N <norm.bam> -T TumorSample -S NormalSample -I 500

pindel -f <ref.fasta> -i bam_configuration_file.txt -o raw_pindel_format -c ALL --window_size 0.1 -E 0.99 -k true -l true -N true

Many thanks for any help / suggestions! Jeremiah

liangkaiye commented 8 years ago

Whole genome or target sequencing? What is your pindel version?

Sent from my iPhone

On Oct 19, 2015, at 14:38, Jeremiah Wala notifications@github.com wrote:

Hello,

I am trying to call indels on a paired tumor/normal BAM file, and am running into very high memory usage issues (currently at 46Gb). I am wondering if this is a memory leak, or if there is some way I could tune the parameters to avoid this. I am running as follows:

python MakePinDelConfig.py -B -N -T TumorSample -S NormalSample -I 500

pindel -f -i bam_configuration_file.txt -o raw_pindel_format -c ALL --window_size 0.1 -E 0.99 -k true -l true -N true

Many thanks for any help / suggestions! Jeremiah

— Reply to this email directly or view it on GitHub.

walaj commented 8 years ago

Whole-genome. Not sure the version number, but I cloned and built from this GitHub repos about 1 week ago. I tried various window-sizes as well, and always hit the same issue (about 46Gb usage).

liangkaiye commented 8 years ago

Genome region specific?

Sent from my iPhone

On Oct 19, 2015, at 14:47, Jeremiah Wala notifications@github.com wrote:

Whole-genome. Not sure the version number, but I cloned and built from this GitHub repos about 1 week ago. I tried various window-sizes as well, and always hit the same issue (about 46Gb usage).

— Reply to this email directly or view it on GitHub.

walaj commented 8 years ago

I want to do an un-biased whole-genome run. I believe the -c ALL flag specifies this?

liangkaiye commented 8 years ago

It processes everything even if you do not put -c ALL there. You might process per chromosome. My question to you: is it always the same genomic region?

On 10/19/15 2:54 PM, Jeremiah Wala wrote:

I want to do an un-biased whole-genome run. I believe the |-c ALL| flag specifies this?

— Reply to this email directly or view it on GitHub https://github.com/genome/pindel/issues/24#issuecomment-149327826.

walaj commented 8 years ago

Ok sure, I can try splitting by chromosome. Do you have any suggestions on how I should scatter / merge in order to run on a per-chromosome basis?

I can't tell where in the genome it's getting all the memory usage from. It is reporting 13M+ reads though as being split, so I imagine that has something to do with it.

Thanks for the quick reply / help!

liangkaiye commented 8 years ago

you could just do -c 1 to get chr1 result. you might want to use -J to exclude centromere regions.

On 10/19/15 3:02 PM, Jeremiah Wala wrote:

Ok sure, I can try splitting by chromosome. Do you have any suggestions on how I should scatter / merge in order to run on a per-chromosome basis?

I can't tell where in the genome it's getting all the memory usage from. It is reporting 13M+ reads though as being split, so I imagine that has something to do with it.

Thanks for the quick reply / help!

— Reply to this email directly or view it on GitHub https://github.com/genome/pindel/issues/24#issuecomment-149330216.

walaj commented 8 years ago

OK sounds good. It doesn't recognize -J but otherwise is running. Many thanks

kbentele commented 8 years ago

Hi Kai, I have a similar problem. I am using pindel (Pindel version 0.2.5b6, 20150915) jointly on about 100 yeast samples. Since there are two regions with excessive coverage on chr12 I tried to exclude these regions with --exclude excluded_regions.bed and this seems to work, because pindel at the beginning puts out this list of regions

       chr12   1       451416
       chr12   468939  489903
       chr12   490456  1078177

but when chr12 is processed the log shows for example

  adding BD from RP.
  adding chr12 458056     458718  +       662             chr12 463798    464205  -       407      to BD events. 5742 Support: 19
  adding chr12 457871     458719  +       848             chr12 463798    464205  -       407      to BD events. 5927 Support: 20

and the memory consumption increases very much such that I have to kill the process. The processing of any other chromosome needs much less memory. When I down-sample these regions prior to running pindel I do not have the memory consumption problems. What am I doing wrong? Should I increase the exluded regions? Or is there a problem with the --exclude option?

Thanks for your help, Kajetan

liangkaiye commented 8 years ago

Pindel secretly extends the region by a range. It is perhaps 10k, if I remember correctly. You could increase the regions.

On 10/19/15 4:07 PM, kajetanb wrote:

Hi Kai, I have a similar problem. I am using pindel (Pindel version 0.2.5b6, 20150915) jointly on about 100 yeast samples. Since there are two regions with excessive coverage on chr12 I tried to exclude these regions with |--exclude excluded_regions.bed| and this seems to work, because pindel at the beginning puts out this list of regions

|chr12 1 451416 chr12 468939 489903 chr12 490456 1078177 |

but when chr12 is processed the log shows for example

|adding BD from RP. adding chr12 458056 458718 + 662 chr12 463798 464205 - 407 to BD events. 5742 Support: 19 adding chr12 457871 458719

  • 848 chr12 463798 464205 - 407 to BD events. 5927 Support: 20 |

and the memory consumption increases very much such that I have to kill the process. The processing of any other chromosome needs much less memory. When I down-sample these regions prior to running pindel I do not have the memory consumption problems. What am I doing wrong? Should I increase the exluded regions? Or is there a problem with the |--exclude| option?

Thanks for your help, Kajetan

— Reply to this email directly or view it on GitHub https://github.com/genome/pindel/issues/24#issuecomment-149347900.

kbentele commented 8 years ago

Thanks, I will increase the regions and report back.

kbentele commented 8 years ago

Hi, increasing the regions by 10k on both sides helped. Thanks, Kajetan