Memory Issues Running Fragpipe on SGE Cluster?

rolivella commented 1 month ago

Hello,

I'm encountering a possible memory-related issue when running Fragpipe on an SGE cluster (dockerized Linux version of Fragpipe). Interestingly, when I run the exact same Docker container on a low-performance PC, the process completes successfully without any issues. However, when I run the same container and analysis on our cluster, I seem to hit a memory limit.

I've already increased the Docker container's memory limit to 24GB, which sometimes allows the run to finish, but other times it doesn't. It appears the success of the run depends on the amount of available memory on the cluster node at the time. I can allocate more memory to the container—up to 54GB if necessary—but that's not the real concern, as the same analysis completes in just a few minutes on the PC.

The error I encounter from MSFragger is as follows:

Number of peptides with more than 5000 modification patterns: 0
Selected fragment index width 2.50 Da.
319622932 fragments to be searched in 1 slices (4.76 GB total)
Operating on slice 1 of 1:
        Fragment index slice generated in 1.57 s
        001. 2022MQ999_DDA_HELA_test.raw Process 'MSFragger' finished, exit code: 137
Process returned non-zero exit code, stopping.

Could it be that when the JVM runs, it consumes all the available memory on the server, or that there’s some issue related to the JVM's memory management? It seems like Fragpipe might be over-consuming memory, potentially leading to the server running out of memory.

Has anyone else encountered a similar issue, and if so, what steps can be taken to resolve it?

Thanks in advance for any advice or suggestions!

rolivella commented 1 month ago

I also put the begining of MSFragger log:

MSFragger version MSFragger-4.1
Batmass-IO version 1.33.4
timsdata library version timsdata-2-21-0-4
(c) University of Michigan
RawFileReader reading tool. Copyright (c) 2016 by Thermo Fisher Scientific, Inc. All rights reserved.
timdTOF .d reading tool. Copyright (c) 2022 by Bruker Daltonics GmbH & Co. KG. All rights reserved.
System OS: Linux, Architecture: amd64
Java Info: 11.0.22, OpenJDK 64-Bit Server VM, Ubuntu
JVM started with 161 GB memory

JVM started with 161 GB memory <----- is this ok?

fcyu commented 1 month ago

If seems that FragPipe detected at least 161 GB memory, while the Docker container has less? Then, when MSFragger asked for more memory, Java tried to allocate it but failed because there was no "physical" memory left.

Java memory management (GC) decides when to free unused memory depending on if there is enough memory left (the difference of -Xmx???GB and the memory has been used. It doesn't know the amount of "physical' memory). If the number in -Xmx???GB is bigger than the actual "physical" memory, JVM will allocate the memory more than the computer has, then crash.

Interestingly, when I run the exact same Docker container on a low-performance PC, the process completes successfully without any issues. However, when I run the same container and analysis on our cluster, I seem to hit a memory limit.

The above might also explain this. If JVM knows that there are not enough memory left, if will free the memory to "make some room".

You can manually tell FragPipe and JVM the amount of the memory that your container has by using the --ram flag.

Best,

Fengchao

rolivella commented 1 month ago

Thanks @fcyu! Is it possible to setup this parameter into the the workflow file?

rolivella commented 1 month ago

Anyway it did the trick! Thank you very much!

fcyu commented 1 month ago

Yes, it is something called workflow.ram. I am not in front of computer, but you can find the exact name at the bottom of FragPipe.workflow.

Best,

Fengchao

rolivella commented 1 month ago

@fcyu I tested with workflow.ram=16 but it's not working, it simply skip it. Are you sure this is the right parameter? Thanks!

fcyu commented 1 month ago

Thanks for testing. There was a bug and has been fixed.

Let me know if you want to use the pre-released version.

Best,

Fengchao

rolivella commented 1 month ago

Thanks, Fengchao. I've implemented a workaround for now, so it's not urgent. I'll wait for the next official version.

Nesvilab / FragPipe

Memory Issues Running Fragpipe on SGE Cluster? #1786