ahasfura / parallel_preseq

parallelizing the preseq calculation for DNA sequencing.
0 stars 0 forks source link

loading BAM files into memory #2

Open lhogstrom opened 9 years ago

lhogstrom commented 9 years ago

@mukarramtahir

Was looking through the BAM reader section of your code. I saw that your using samtools view to load in all of the lines of the code at ounce: image

This is fine when the files are small, but won't work on sequencing files that are hundreds of gigs. Do you want me to adapt this? I know you had mentioned that you might adapt this section to work in parallel anyway...

L

mukarramtahir commented 9 years ago

Yes, this is just a placeholder. I am going to work on this today, and will use index files to read the lines so that I don't overflow the memory when reading big files.