Closed jblamyatifremer closed 6 years ago
@jblamyatifremer, currently ploidyNGS loads the full BAM file in memory, so if you have a large BAM (large genome and/or many many reads), and not a big RAM machine, this would be a problem. We mention that in the README.
Running contig by contig would help, but not in parallel, as you might run out of memory again, I recommend you do it one after the other.
Good luck
Dear ploidyNGS creator,
Thank you for your works and maintaining this git-hub. Your test dataset was OK on my installation.
I launch this command :
[userlocal@NTLT101 ploidyNGS]$ cd ~/ploidyNGS [userlocal@NTLT101 ploidyNGS]$ source .venv/bin/activate (.venv) [userlocal@NTLT101 ploidyNGS]$ ./ploidyNGS.py -o /PATH_OUTPUT/diploidTest -b /PATH_OUTPUT/all_sort.bam -d 50 ###############################################################
This is ploidyNGS version v3.1.2
nCurrent date and time: Mon Oct 16 18:16:10 2017
############################################################### No index available for pileup. Creating an index... Number of mapped reads from BAM: 14590766 Killed
I suppose that my comptuer (not very powerfull) run out of memory. How to prevent such problems ?
1- It would be interessting to have some rough idea our memory consumption and/or duration of computation for a given computer architecture.
2- An other option would be to have some warmings before launching the computation and/or have an option to process the dataset by chunk.
I am thinking of writing a bash script based on samtools 1.4.x : The planned step are : 1- Split the bam by contigs and if these contigs are too larges (compared to your test dataset) 2- They will be splitted in smaller bam. 3- Launch ./ploidyNGS.py in parallel in bash. I do not know if it is possible regarding your special environment "(.venv)"... Do you know if it is possible or not ?
Cheers,
JB