Closed alexg9010 closed 6 years ago
Can we store stuff in HDF or some on disk stuff ? Maybe that's to do for genomation as well
On Sat, Apr 7, 2018 at 9:45 PM, Alexander Gosdschan < notifications@github.com> wrote:
The step knit report requires a way too large amount of memory, because the RDS created at Extract_Signal_Annotation (https://github.com/ BIMSBbioinfo/pigxchipseq/blob/master/scripts/Extract Signal_Annotation.R#L63)keeps the 11 scorematrixlist objects for every genomic annotation with a scorematrix for every sample in lsml list, even though the profile signal is already summarized in profiles tibble. The same lsml lists is passed to Summarize_Data_For_Report, which leads to the large memory footprint.
In my example analysis I have 16 samples and the lstats$ExtractSignal Annotation$lsml alone is 28G, without actually beeing used.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BIMSBbioinfo/pigx_chipseq/issues/67, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm9ESucGnr13qTMMef6c4T7S6-uYEKiks5tmRdFgaJpZM4TLOT7 .
The score matrices are not needed - I included them because they are always nice to have for downstream analysis, and there was a plan to have a multi heat matrix in the report - but this does not scale with the number of experiments. It seems it's faster calculate them when needed than to have them prepared, especially for a large number of samples. I think you can comfortably set the sml object to NULL, and everything should go much faster.
The step
knit report
requires a way too large amount of memory, because the RDS created atExtract_Signal_Annotation
(https://github.com/BIMSBbioinfo/pigx_chipseq/blob/master/scripts/Extract_Signal_Annotation.R#L63) keeps the 11 scorematrixlist objects for every genomic annotation with a scorematrix for every sample inlsml
list, even though the profile signal is already summarized inprofiles
tibble. The samelsml
lists is passed toSummarize_Data_For_Report
, which leads to the large memory footprint.In my example analysis I have 16 samples and the
lstats$Extract_Signal_Annotation$lsml
alone is 28G, without actually beeing used.