Closed louislamont closed 10 months ago
Hi @louislamont
Thank you for using pycisTarget.
From you script it looks like you are starting a new slurm job for each worker/ray job? This is not how we have intended the multiprocessing to be used. It's configured to use multiple cores in a single slurm job.
Have you tried starting a single slurm job, where you requested multiple cores, and running the analysis that way?
b.t.w. the analysis should not take multiple days, even using a single core. It could take a couple of hours though.
Hopefully we get you up and running soon.
Best,
Seppe
Hi Seppe,
Thank you for your help with this. The slurm script I was using was based on the guidelines in the ray documentation. I believe you are referring to the
for ((i = 1; i <= worker_num; i++)); do
node_i=${nodes_array[$i]}
echo "Starting WORKER $i at $node_i"
srun --nodes=1 --ntasks=1 -w "$node_i" \
ray start --address "$ip_head" \
--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus "${SLURM_GPUS_PER_TASK}" --block &
sleep 5
done
part, which I think - if multiple nodes are requested for a job - starts a ray instance on the additional nodes and links them to the head node. My script is using a single node with 9 cores, so that part should not be invoked.
On a single node, the CTX part runs relatively quickly with a single core (an hour or two), but the DEM analysis is taking 5m - 2h per topic, which with 85 topics with and without promoters, takes a substantial amount of time.
Since it seems like it is taking an abnormally long even with one core, could the issue be with the cisTopic object used to generate the topics, or with the db I am using? I exported a sparse matrix from Seurat and used that to create the cisTopic object as in this tutorial - https://pycistopic.readthedocs.io/en/latest/Toy_melanoma-RTD.html - the dimensions match my Seurat object at least.
print(cistopic_obj)
CistopicObject from project cisTopic with n_cells × n_regions = 66343 × 259441
I created a database using the ~260,000 regions derived from Seurat/Signac MACS2 peaks calling with the scripts and 10250 motifs from this repository: https://github.com/aertslab/create_cisTarget_databases, specifically using the create_cistarget_motif_databases.py script
Hi @louislamont
It's normal that DEM takes quite a bit longer compared to cisTarget.
Normally you should not do anything special in your slurm command to work with ray, the way it's implemented in pycistarget.
Can you try starting a slurm job like this
srun \
--nodes 1 \
--ntask 1 \
--num-cpus ${SLURM_CPUS_PER_TASK} \
--mem <amount_of_memory>G \
--time 12:00:00 \
--pty bash -l
Request enough memory (250G for example).
This should start a job with shell access. Try running pycistarget in this shell.
Best,
Seppe
Hi, thank you for creating this analytical pipeline. I am using this as part of the SCENIC+ suite, but running into some issues with runtime and CPU use when trying to run this step with multiple cores on a SLURM cluster. Using a single CPU gives me no errors, but looks like it will take several days to finish all steps. Apologies if I am simply just running this incorrectly, not reserving enough resources, etc.
Describe the bug pycisTarget runs slowly when using multiple cores on SLURM. Uses about half of the cores reserved for the job. Dumps 100+ GB of files to the ray_spilled_files directory.
Error output
Expected behavior pycisTarget uses all cores reserved for the job, runs faster than with single CPU settings
Additional context I tried to set up my SLURM submission script according the guidelines here: https://github.com/aertslab/scenicplus/issues/24 and here: https://docs.ray.io/en/latest/cluster/vms/user-guides/community/slurm.html . I was able to get something running with no startup errors, but it seemed to be running much slower than if I ran this single-threaded. It also seemed to be using only 3 of the cores I reserved (based on the worker.out files in the
session_latest/logs
folder.)This is my SLURM sbatch script: