Open bioinformaticaInta opened 1 month ago
Sure! Could you share your current params.json? How many cells and genes are in your data set? And could you perhaps share the trace output from the reports folder?
Hi, thanks for the response!!!!
We have a 13400 cells with 32000 genes (from Mus musculus genome).
I tried several configurations in params.json file. I knew from the help tutorial that the memory and cpus parameters only can be used with sge executor, but I didn't find other way of incate this limits.
The following is the last configuration file that I tried:
{
"dataType" : "agnostic",
"rawDataPath" : "/vz/conda-docker/scRNA-Seq/GFP_dop.csv",
"userGenes" : "/vz/conda-docker/scRNA-Seq/userGenes.list",
"nGenes" : 32285,
"nCells" : 14200,
"PCalpha" : 0.05,
"asympBool" : 0,
"bsResamps" : 1000,
"estimationMode" : "MFI",
"nRandomHOIs" : 1000,
"sigHOIthreshold" : 0.05,
"plotPairwiseUpsets": 0,
"minStateDeviation" : 3,
"stateDevAlpha" : 0.05,
"dendCutoff" : -1,
"executor" : "",
"maxQueueSize" : 1,
"cores_makeData": 60,
"cores_PC" : 60,
"cores_MCMC" : 60,
"cores_1pt" : 60,
"cores_2pt" : 60,
"cores_HOIs_MB" : 60,
"cores_HOIs_6n7" : 60,
"cores_HOIs_plots": 60,
"mem_makeData" : "450G",
"mem_PC" : "450G",
"mem_MCMC" : "450G",
"mem_1pt" : "450G",
"mem_2pt" : "450G",
"mem_HOIs_MB" : "450G",
"mem_HOIs_6n7" : "450G",
"mem_HOIs_plots" : "450G",
}
Finally, this is the trace output from the report folder, the first step of the process (makeData) finished correctly, but in the second step the server was crash, making that I had to restart:
task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar
1 e2/402f32 15730 makeData COMPLETED 0 2024-09-19 14:53:36.832 8m 46s 8m 39s 248.1% 12.9 GB 29.9 GB 907.7 MB 2.6 GB
Please tell me if this information is enough to make a diagnosis of the problem, and again thank you for the response!!!!! Sergio
Dear Sergio,
Ah, I am pretty sure that the server crashed because of the "nGenes" : 32285,
line in your params.json
. It would be great to run the Stator state inference on all genes, but this is computationally unfeasible (consider how many triplets there are among 30k genes). In our analyses, we found that including the first few hundred most highly variable genes was sufficient for robust interaction and state inference, so I suggest setting "nGenes" : 500,
or so. You could also set it to 50 just to quickly see if this indeed solves the problem.
If you still run in to problems, please let me know!
Abel
Dear Abel, I had misunderstood the parameter "nGenes". I will trie the execution with that change. I will tell you, if I have other problem in new run.
Thank you for the responses again!!!!!
Sergio
Dear Abel, we modified the params.json file and we obtained other error, in this case the error was associated with the step four (calcHOIsWithinMB.py), because the script don't accept a 'null' value for the parameter 'nCores'.
This is the output in the command line:
[8e/ff62e7] process > makeData [100%] 1 of 1 ✔
[55/7b7d37] process > estimatePCgraph [100%] 1 of 1 ✔
[c8/cffe9a] process > iterMCMCscheme [100%] 1 of 1 ✔
[58/22a62c] process > estimateCoups_2345pts_WithinMB [100%] 1 of 1, failed: 1 ✘
[- ] process > estimateCoups_6n7pts -
[- ] process > identifyDTuples -
[- ] process > identifyStates -
WARN: Access to undefined parameter cores_HOIs_MB
-- Initialise it to a default value eg. params.cores_HOIs_MB = some_value
ERROR ~ Error executing process > 'estimateCoups_2345pts_WithinMB'
Caused by:
Process estimateCoups_2345pts_WithinMB
terminated with an error exit status (2)
Command executed:
python calcHOIsWithinMB.py --dataPath trainingData_14200Cells_0500Genes.csv --graphPath MCMCgraph_14200Cells_0500Genes.csv --nResamps 1000 --nCores null --nRandoms 1000 --genesToOne null --dataDups 0 --boundBool 0 --asympBool 0 --estimationMode MFI
Command exit status: 2
Command output: Importing modules... Modules imported
Command error: Importing modules... Modules imported
usage: calcHOIsWithinMB.py [-h] [--dataPath DATAPATH] [--graphPath GRAPHPATH] [--nResamps NRESAMPS] [--nCores NCORES] [--nRandoms NRANDOMS] [--genesToOne GENESTOONE] [--dataDups DATADUPS] [--boundBool BOUNDBOOL] [--asympBool ASYMPBOOL] [--estimationMode ESTIMATIONMODE] calcHOIsWithinMB.py: error: argument --nCores: invalid int value: 'null'
This is my new params.json file content:
{ "dataType" : "agnostic", "rawDataPath" : "/vz/conda-docker/scRNA-Seq/GFP_dop.csv", "userGenes" : "/vz/conda-docker/scRNA-Seq/userGenes.list", "nGenes" : 500, "nCells" : 14200, "PCalpha" : 0.05, "asympBool" : 0, "bsResamps" : 1000, "estimationMode" : "MFI", "nRandomHOIs" : 1000, "sigHOIthreshold" : 0.05, "plotPairwiseUpsets": 0, "minStateDeviation" : 3, "stateDevAlpha" : 0.05, "dendCutoff" : -1, "executor" : "" }
Is it possible that I don't have some parameter in json file?
Thank you
Sergio
Hi,
I am trying to run the software but I'm encountering an issue with the amount of RAM it uses. I'm not using SGE because I want to run the software directly on a server with 80 cores and 512 GB of RAM. Could you please guide me on how to configure the parameters in the params.json file?
Thank you very much!