AJnsm / Stator

STATOR: A Nexflow pipeline to infer cell types, subtypes, and states from gene expression data.
GNU General Public License v3.0
7 stars 3 forks source link

Problem with RAM in execution #2

Open bioinformaticaInta opened 1 month ago

bioinformaticaInta commented 1 month ago

Hi,

I am trying to run the software but I'm encountering an issue with the amount of RAM it uses. I'm not using SGE because I want to run the software directly on a server with 80 cores and 512 GB of RAM. Could you please guide me on how to configure the parameters in the params.json file?

Thank you very much!

AJnsm commented 1 month ago

Sure! Could you share your current params.json? How many cells and genes are in your data set? And could you perhaps share the trace output from the reports folder?

bioinformaticaInta commented 1 month ago

Hi, thanks for the response!!!! We have a 13400 cells with 32000 genes (from Mus musculus genome). I tried several configurations in params.json file. I knew from the help tutorial that the memory and cpus parameters only can be used with sge executor, but I didn't find other way of incate this limits. The following is the last configuration file that I tried: { "dataType" : "agnostic", "rawDataPath" : "/vz/conda-docker/scRNA-Seq/GFP_dop.csv", "userGenes" : "/vz/conda-docker/scRNA-Seq/userGenes.list", "nGenes" : 32285, "nCells" : 14200, "PCalpha" : 0.05, "asympBool" : 0,
"bsResamps" : 1000, "estimationMode" : "MFI", "nRandomHOIs" : 1000, "sigHOIthreshold" : 0.05, "plotPairwiseUpsets": 0, "minStateDeviation" : 3, "stateDevAlpha" : 0.05, "dendCutoff" : -1, "executor" : "", "maxQueueSize" : 1, "cores_makeData": 60, "cores_PC" : 60, "cores_MCMC" : 60, "cores_1pt" : 60, "cores_2pt" : 60, "cores_HOIs_MB" : 60, "cores_HOIs_6n7" : 60, "cores_HOIs_plots": 60, "mem_makeData" : "450G", "mem_PC" : "450G", "mem_MCMC" : "450G", "mem_1pt" : "450G", "mem_2pt" : "450G", "mem_HOIs_MB" : "450G", "mem_HOIs_6n7" : "450G", "mem_HOIs_plots" : "450G", } Finally, this is the trace output from the report folder, the first step of the process (makeData) finished correctly, but in the second step the server was crash, making that I had to restart: task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar 1 e2/402f32 15730 makeData COMPLETED 0 2024-09-19 14:53:36.832 8m 46s 8m 39s 248.1% 12.9 GB 29.9 GB 907.7 MB 2.6 GB

Please tell me if this information is enough to make a diagnosis of the problem, and again thank you for the response!!!!! Sergio

AJnsm commented 1 month ago

Dear Sergio,

Ah, I am pretty sure that the server crashed because of the "nGenes" : 32285, line in your params.json. It would be great to run the Stator state inference on all genes, but this is computationally unfeasible (consider how many triplets there are among 30k genes). In our analyses, we found that including the first few hundred most highly variable genes was sufficient for robust interaction and state inference, so I suggest setting "nGenes" : 500, or so. You could also set it to 50 just to quickly see if this indeed solves the problem.

If you still run in to problems, please let me know!

Abel

ApoloBiotech commented 1 month ago

Dear Abel, I had misunderstood the parameter "nGenes". I will trie the execution with that change. I will tell you, if I have other problem in new run.

Thank you for the responses again!!!!!

Sergio

bioinformaticaInta commented 3 weeks ago

Dear Abel, we modified the params.json file and we obtained other error, in this case the error was associated with the step four (calcHOIsWithinMB.py), because the script don't accept a 'null' value for the parameter 'nCores'.

This is the output in the command line:

[8e/ff62e7] process > makeData [100%] 1 of 1 ✔ [55/7b7d37] process > estimatePCgraph [100%] 1 of 1 ✔ [c8/cffe9a] process > iterMCMCscheme [100%] 1 of 1 ✔ [58/22a62c] process > estimateCoups_2345pts_WithinMB [100%] 1 of 1, failed: 1 ✘ [- ] process > estimateCoups_6n7pts - [- ] process > identifyDTuples - [- ] process > identifyStates - WARN: Access to undefined parameter cores_HOIs_MB -- Initialise it to a default value eg. params.cores_HOIs_MB = some_value ERROR ~ Error executing process > 'estimateCoups_2345pts_WithinMB'

Caused by: Process estimateCoups_2345pts_WithinMB terminated with an error exit status (2)

Command executed:

python calcHOIsWithinMB.py --dataPath trainingData_14200Cells_0500Genes.csv --graphPath MCMCgraph_14200Cells_0500Genes.csv --nResamps 1000 --nCores null --nRandoms 1000 --genesToOne null --dataDups 0 --boundBool 0 --asympBool 0 --estimationMode MFI

Command exit status: 2

Command output: Importing modules... Modules imported

Command error: Importing modules... Modules imported

usage: calcHOIsWithinMB.py [-h] [--dataPath DATAPATH] [--graphPath GRAPHPATH] [--nResamps NRESAMPS] [--nCores NCORES] [--nRandoms NRANDOMS] [--genesToOne GENESTOONE] [--dataDups DATADUPS] [--boundBool BOUNDBOOL] [--asympBool ASYMPBOOL] [--estimationMode ESTIMATIONMODE] calcHOIsWithinMB.py: error: argument --nCores: invalid int value: 'null'

This is my new params.json file content:

{ "dataType" : "agnostic", "rawDataPath" : "/vz/conda-docker/scRNA-Seq/GFP_dop.csv", "userGenes" : "/vz/conda-docker/scRNA-Seq/userGenes.list", "nGenes" : 500, "nCells" : 14200, "PCalpha" : 0.05, "asympBool" : 0, "bsResamps" : 1000, "estimationMode" : "MFI", "nRandomHOIs" : 1000, "sigHOIthreshold" : 0.05, "plotPairwiseUpsets": 0, "minStateDeviation" : 3, "stateDevAlpha" : 0.05, "dendCutoff" : -1, "executor" : "" }

Is it possible that I don't have some parameter in json file?

Thank you

Sergio