UCSF-DSCOLAB / cyclone

3 stars 2 forks source link

Vignette Updates: Add specs for 50million cells #11

Closed dtm2451 closed 1 year ago

dtm2451 commented 1 year ago

Adds text "and ~520Gb RAM with nthreads: 3 for ~50 million cells" to bullet about jobs specs we've used. One note I have, that we could fold into here, is that we could ass how many nthreads were used in the other examples as well, if we have that info.

rebeccagj commented 1 year ago

~420Gb RAM for ~40 million cells was me so I can comment on that...

I requested 5 threads from sbatch, and specified 4 threads in my config.yml. This protocol kept slurm from trying to create a job with 5*420Gb RAM, iirc. When I ssh'd into my compute node, I saw 4 threads, with about 105-110gb RAM each. I hadn't put thread info on before, because I wasn't sure how detailed we wanted to be

dtm2451 commented 1 year ago

I was intending this nthread to be in relation to one of the above bullet points within the same section -- the value from the config file. Intention being to put these high memory asks in context that it's 420Gb for 4x parallelization in the optimization step because we told cyclone to use nthread: 4 while reserving that amount of memory to do so. But clearly the wording needs to be updated to reflect that intent! Or we leave it out. Open to whatever you think is best!

dtm2451 commented 1 year ago

Perhaps each instance can be structured ("We have used...) "~420Gb RAM for 4x parallelization with ~40million cells (config 'nthreads: 4', scheduler ntasks=5), and ~520Gb RAM for 3x parallelization with ~50 million cells (config 'nthreads: 3', scheduler ntasks=4)." We'd just need to find out, and fill in, what the nthreads was for the ~20 million cells case.

rebeccagj commented 1 year ago

Was the 20 million cells case one that @ravipatel4 ran? Maybe if so, he can comment the specs on paralleizations (if any) for the 20 million cells case.

I do like the idea of specifying config nthreads vs scheduler ntasks in our summary. I think if we can't find the info for the 20mill one, we should remove it and add specifications like your most recent comment suggests, Dan.

dtm2451 commented 1 year ago

Just adding here that we received this summary for the 20million case offline from Ravi: "~100Gb RAM without parallaization with ~20mil cells (nthreads = 1)"

I've updated the vignette with these specs per the structure laid out in https://github.com/UCSF-DSCOLAB/cyclone/pull/11#issuecomment-1378986329.

Also created a 'vignettes' folder in the top-level of the repo where we will shared knitted versions of our vignettes & put a knitted version of this vignette there!