diriano / ploidyNGS

Explore ploidy levels from NGS data alone
GNU General Public License v3.0
38 stars 14 forks source link

Job Killed. Running time and memory consumption #6

Closed jblamyatifremer closed 6 years ago

jblamyatifremer commented 6 years ago

Dear ploidyNGS creator,

Thank you for your works and maintaining this git-hub. Your test dataset was OK on my installation.

I launch this command :

[userlocal@NTLT101 ploidyNGS]$ cd ~/ploidyNGS [userlocal@NTLT101 ploidyNGS]$ source .venv/bin/activate (.venv) [userlocal@NTLT101 ploidyNGS]$ ./ploidyNGS.py -o /PATH_OUTPUT/diploidTest -b /PATH_OUTPUT/all_sort.bam -d 50 ###############################################################

This is ploidyNGS version v3.1.2

nCurrent date and time: Mon Oct 16 18:16:10 2017

############################################################### No index available for pileup. Creating an index... Number of mapped reads from BAM: 14590766 Killed

I suppose that my comptuer (not very powerfull) run out of memory. How to prevent such problems ?

1- It would be interessting to have some rough idea our memory consumption and/or duration of computation for a given computer architecture.

2- An other option would be to have some warmings before launching the computation and/or have an option to process the dataset by chunk.

I am thinking of writing a bash script based on samtools 1.4.x : The planned step are : 1- Split the bam by contigs and if these contigs are too larges (compared to your test dataset) 2- They will be splitted in smaller bam. 3- Launch ./ploidyNGS.py in parallel in bash. I do not know if it is possible regarding your special environment "(.venv)"... Do you know if it is possible or not ?

Cheers,

JB

diriano commented 6 years ago

@jblamyatifremer, currently ploidyNGS loads the full BAM file in memory, so if you have a large BAM (large genome and/or many many reads), and not a big RAM machine, this would be a problem. We mention that in the README.

Running contig by contig would help, but not in parallel, as you might run out of memory again, I recommend you do it one after the other.

Good luck