aryarm / as_analysis

A complete Snakemake pipeline for detecting allele specific expression in RNA-seq
MIT License
10 stars 9 forks source link

memory #35

Closed aryarm closed 6 years ago

aryarm commented 6 years ago

Our pipeline can sometimes use a lot of memory (especially when running the GTEX samples). This often results in issues where some jobs will get killed because we use all the memory on the cluster. Last I checked, this was happening when I used STAR to map reads (the second time). It also occurred when I was using WASP's get_as_counts.

We can fix get_as_counts by implementing a lighter hdf5 version of it (see my fork of WASP), but I'm not sure about what to do with STAR. My guess is that it has something to do with loading all of the VCF data all at once but you should generally look into how much memory all of the rules use and whether there is anything you can do about it (besides just scaling down the number of jobs that snakemake will run at any given moment of time).

aryarm commented 6 years ago

this article in the snakemake docs about resource allocation may be useful

aryarm commented 6 years ago

It seems like things are less on fire now, especially after submitting the get_as_counts pull request into WASP. I'm going to close the issue for now