gagneurlab / FRASER

FRASER - Find RAre Splicing Events in RNA-seq
MIT License
36 stars 20 forks source link

FRASER maxing out memory on sge cluster #29

Open Jessen-Erik opened 2 years ago

Jessen-Erik commented 2 years ago

Hello!

We are attempting to use FRASER on a cohort of 400 samples. We've been experiencing issues completing FRASER when sending the job to our sge queue, even when providing 1.5TB of memory. It seems the parallelization of the PSI calculation (fds <- calculatePSIValues(fds,BPPARAM=BPPARAM)) is causing the job to go over our h_vmem allocation. We've attempted to force FRASER to run in serial (BPPARAM=SerialParam()), but encounter the same issue maxing out of the memory.

Is it possible that FRASER is ignoring the serial setting?

Is there a quick fix for this? Or does a solution similar to the link below have to be implemented: https://github.com/gagneurlab/OUTRIDER/issues/11

Thank you for the tool, we've really enjoyed running it on some previous cohorts.

ischeller commented 2 years ago

Hi @Jessen-Erik , thanks for trying out FRASER! Regarding your problem, can you check the dimension of your fds object that you use as input for this step? As we typically run this step before filtering, I suspect that you could have a quite large fds object and that this is causing the problem rather than the parallelization itself. If this is indeed the case, you could try applying the minExpressionInOneSample filter before the PSI calculation step (we provide the option to do this as part of the countRNAData function), as this typically reduces the number of junctions inside the fds object a lot.

Jessen-Erik commented 2 years ago

I checked the dimensions and size of the file: dim(fds) [1] 3605173 406 object.size(fds) 65659480 bytes

What is the default minExpressionInOneSample? Just 1 read?

From: Ines Scheller @.> Sent: Thursday, September 30, 2021 8:58 AM To: c-mertes/FRASER @.> Cc: Jessen, Erik, Ph.D. @.>; Mention @.> Subject: [EXTERNAL] Re: [c-mertes/FRASER] FRASER maxing out memory on sge cluster (#29)

Hi @Jessen-Erikhttps://github.com/Jessen-Erik , thanks for trying out FRASER! Regarding your problem, can you check the dimension of your fds object that you use as input for this step? As we typically run this step before filtering, I suspect that you could have a quite large fds object and that this is causing the problem rather than the parallelization itself. If this is indeed the case, you could try applying the minExpressionInOneSample filter before the PSI calculation step (we provide the option to do this as part of the countRNAData function), as this typically reduces the number of junctions inside the fds object a lot.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/c-mertes/FRASER/issues/29#issuecomment-931347502, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANAFPABEJKKDWKCCCQGI7R3UERUFDANCNFSM5EWGEXNA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

c-mertes commented 2 years ago

We are sorry that we did not reply anymore. Thanks for sharing the dimensions. 3mio junctions is big, but should not require 1.5Tb memory. For this purpose 1 read is enough as it is used only to remove random alignments.

Since there was no further response, I assume that the minExpressionInOneSample filter step helped here.