Closed NicolettaK closed 3 years ago
Can you please provide a test code I can execute to reproduce the issue? Is there any error message?
On Sun, Dec 6, 2020, 03:29 Nicoletta Krachmalnicoff < notifications@github.com> wrote:
Hi,
I need to run pysm3 in parallel, but I'm encountering a problem I can't solve.
I have this simplified version of my code:
1 import healpy as hp 2 import numpy as np 3 import pysm3 4 import pysm3.units as u 5 import os 6 7 8 from mpi4py import MPI 9 comm = MPI.COMM_WORLD 10 rank = comm.Get_rank() 11 size = comm.Get_size() 12 13 write_dir =f'./test/{rank}/' 14 if not os.path.exists(write_dir): 15 os.makedirs(write_dir) 16 test_map = np.arange(hp.nside2npix(128)) 17 hp.write_map(f'{write_dir}/testmap{rank}.fits', test_map, overwrite=True, dtype=np.float32) 18 sky = pysm3.Sky(nside=128, preset_strings=["s1"]) 19 hp.write_map(f'{write_dir}/testmap{rank}_after_sky.fits', test_map, overwrite=True, dtype=np.float32) 20 sky_extrap = sky.get_emission(145.*u.GHz) 21 hp.write_map(f'{write_dir}/testmap{rank}_after_getem.fits', test_map, overwrite=True, dtype=np.float32) 22 hp.write_map(f'{write_dir}/skyextrep{rank}.fits', sky_extrap, overwrite=True, dtype=np.float32)
I'm trying to run it on a interactive job at NERSC: salloc -N 2 -C knl -q interactive -t 04:00:00 I then set export OMP_NUM_THREADS=2 and run the code with: mpirun -np 100 python test_parallel.py
what happens is the following: the code correctly writes in 100 different folders the 100 test maps at line 17 and 19. But it writes the maps at line 21 an 22 only for a subset of processes (between 4 and 6 depending on the run).
Not that this happens for "s0","d0" ,"d1" (I haven't tried the others) but not for "c1"!
Any idea why this could happen?
Thanks a lot!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/healpy/pysm/issues/70, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5Q4UUXVFA44COEJNWD4LSTNTH3ANCNFSM4UPIBSXA .
Hi @zonca, with the code I posted and the interactive job you should be able to reproduce the problem. No error, it keeps running but without writing anything on disk.
sorry @NicolettaK, keep having higher priority items step in front of this, it's going to take time.
@NicolettaK I think it is the number of threads of numba
,
I tested with this and worked fine, but better if you test yourself and confirm:
#!/bin/bash
#SBATCH --qos=debug
#SBATCH --time=30
#SBATCH --nodes=2
#SBATCH --tasks-per-node=50
#SBATCH --cpus-per-task=2
#SBATCH --constraint=knl
export OMP_PROC_BIND=true
export OMP_PLACES=threads
export OMP_NUM_THREADS=2
export NUMBA_NUM_THREADS=2
#export NUMBA_DISABLE_JIT=1
srun python run.py
@NicolettaK have you had a chance to test this? I would like to add this to the docs if you confirm it works
Hi,
I need to run pysm3 in parallel, but I'm encountering a problem I can't solve.
I have this simplified version of my code:
I'm trying to run it on a interactive job at NERSC:
salloc -N 2 -C knl -q interactive -t 04:00:00
I then setexport OMP_NUM_THREADS=2
and run the code with:mpirun -np 100 python test_parallel.py
what happens is the following: the code correctly writes in 100 different folders the 100 test maps at line 17 and 19. But it writes the maps at line 21 an 22 only for a subset of processes (between 4 and 6 depending on the run).
Not that this happens for "s0","d0" ,"d1" (I haven't tried the others) but not for "c1"!
Any idea why this could happen?
Thanks a lot!