Closed joaofrancafisica closed 2 years ago
Job submission script:
#!/bin/bash
##
## ntasks = quantidade de nucleos
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=35
#
## Nome do job. Aparece na saida do comando 'squeue'.
## E recomendado, mas nao necesssario, que o nome do job
#SBATCH --job-name=J1018
echo -e "\n## Job iniciado em $(date +'%d-%m-%Y as %T') #####################\n"
## O nome dos arquivos de input e output sao baseados no nome do job.
## Observe que nao e obrigatorio esta forma de nomear os arquivos.
## Informacoes do job impressos no arquivo de saida.
echo -e "\n## Jobs ativos de $USER: \n"
squeue -a -u $USER
echo -e "\n## Node de execucao do job: $(hostname -s) \n"
echo -e "\n## Numero de tarefas para este job: $SLURM_NTASKS \n"
## Execucao do software
module load python
python pipeline.py
echo -e "\n## Job finalizado em $(date +'%d-%m-%Y as %T') ###################"
Can you tell me what version of dynesty and autofit you are on?
pip show dynesty
pip show autofit
Also the Python version.
I honesty have no idea, paralliezation is a nightmare.
For Emcee, did you follow the `Multiprocessing' or 'MPI' example?
Thanks for your prompt answer!
Yeah, sure, here it is:
Python 3.9.12 Dynesty 1.0.1 AutoFit 2022.07.11.1
I tried the 'Multiprocessing' one. Do you think it could work using MPI?
Dynesty supports multprocessing, I was checking if it only worked with MPI.
Can you check the behaviour of an autofit Emcee fit, by changing:
search_0 = af.DynestyStatic(path_prefix = './', # Prefix path of our results
name = 'source_parametric', # Name of the dataset
unique_tag = '0839_[1]', # File path of our results
nlive = 250,
number_of_cores=35)
too:
search_0 = af.Emcee(path_prefix = './', # Prefix path of our results
name = 'source_parametric', # Name of the dataset
unique_tag = '0839_[1]', # File path of our results
number_of_cores=35)
This will inform me if its a dynesty specific issue or autofit specific issue.
It seems to work with Emcee although the gain in performance was not what I was expecting:
My laptop 1 core: ~1000 seconds Cluster 1 core: ~1000 seconds Cluster 35 cores: ~500 seconds
Also, during the sampling, the cpus usage were about ~20%.
I will have a think.
It may be worth profiling 4 and 8 cores (for emcee and dynesty). When there are too many cores information passing can overwhelm the speed up and actually cause things to slow down.
I think you are right. I tried to run with n_cores=4 and I was able to see the processes. It wasn't as fast as my laptop (694/825 sec) but I will make sure I have the same packages version. Thank you so much!
I think parallelization is probably working ok then, but that it just is not giving a huge speed up.
This is common for runs we do, with a ~x5 speed up across 25 cores (and often slower speed up for more cores). Packages like dynesty don't parallelize particularly efficiently unfortunately.
Your laptop being faster could be because its hardware runs PyAutoLens faster natively than the super computer -- this is not all that uncommon.
Yeah. I noticed that the numba version in the cluster was a bit different from the one in my laptop so I decided to export the environment. For low core numbers, the parallelization is working fine, roughly at the same speed. I will try a higher number again but that's unfortunate that it is not efficient in parallelizations.
Hello again!
I am trying to model a strong lensing system in a cluster using a slurm job sumbission script and it seems the flag number_of_cores in the Dynestic Static is not working properly. When I look at the multicore processes (htop) it shows PyAutoLens is running only in a single core even though number_of_cores > 1. Just for completeness, I runned this example script of emcee which makes use of the parallelization and everything looks fine (the multicore tasks seems to be distributed alongside the cores). I am sorry if I am missing something.
The script I am using: