dnanexus-archive / viral-ngs

viral-ngs
6 stars 6 forks source link

demux jobs for hiseq lanes should get bigger instances #49

Closed dpark01 closed 7 years ago

dpark01 commented 8 years ago

Looking at job-BzVzv5j0jy12PPyv782QXZV3 it appears that hiseq lanes get dispatched on a mem1_ssd1_x4. In my own experience, I think we'd want a couple hundred GB of local instance storage at a minimum as well as at least 40-50GB RAM. I might suggest a mem3_ssd1_x8 or possibly a mem1_ssd2_x16 (if local storage is the limiting factor). Feel free to play with the associated flowcell of data as a test case.

dpark01 commented 8 years ago

Also might be helpful in these scenarios to expose an option in the demux wrapper that allows you to specify the instance type of the illumina_demux jobs that spawn off (right now, the user can only change the instance type of the demux outer wrapper which isn't really helpful.

mlin commented 7 years ago

revised all the instance types during the dockerization I see the demux jobs using a fair amount of disk, but haven't found examples where it uses a lot of memory...is that only under certain circumstances? (human depletion in contrast does both use a lot of disk and spiky/high memory usage)

dpark01 commented 7 years ago

Great. I think I saw high RAM use in a previous scenario where we had the Picard --threads parameter set too high... it's linearly proportional to the thread count. If you've seen hiseq lanes fit in a lower RAM footprint, that's great. High disk usage makes sense because of all the untarring and temp files.

dpark01 commented 7 years ago

Huh, looking at it, it seems you're right that demux memory usage seems actually quite minimal when the thread count is controlled. I wonder if runtime might benefit from more core count. Or LZ4 uploads instead of gzip.

mlin commented 7 years ago

Fixed a previous error in the instance type selection in f7fd31eb