desihub / tutorials

DESI tutorials
BSD 3-Clause "New" or "Revised" License
43 stars 16 forks source link

Target selection in the FiberAssignMocks notebook #33

Closed andluizsouza closed 1 year ago

andluizsouza commented 4 years ago

Hi people!

I can not run the target selection command in the FiberAssignMocks notebook on desi 19.2 environment and branch master:

srun -A desi -N 2 -n 16 -c 8 -C haswell -t 01:00:00 --qos interactive mpi_select_mock_targets --no-spectra --nproc 4 --nside 32 --seed 10 -c ./input.yaml --output_dir ./ --tiles ./tiles.fits

It shows the following error message:

ERROR:mpi_select_mock_targets:176:: Pixels [4509] failed after 7.2 minutes ERROR:mpi_select_mock_targets:179:: Traceback (most recent call last): File "/global/common/software/desi/cori/desiconda/20180709-1.2.6-spec/code/desitarget/0.28.0/lib/python3.6/site-packages/desitarget-0.28.0-py3.6.egg/EGG-INFO/scripts/mpi_select_mock_targets", line 170, in healpixels=rankpix[i:i+n], no_spectra=args.no_spectra) File "/global/common/software/desi/cori/desiconda/20180709-1.2.6-spec/code/desitarget/0.28.0/lib/python3.6/site-packages/desitarget-0.28.0-py3.6.egg/desitarget/mock/build.py", line 872, in targets_truth MakeMock=AllMakeMock[ii]) File "/global/common/software/desi/cori/desiconda/20180709-1.2.6-spec/code/desitarget/0.28.0/lib/python3.6/site-packages/desitarget-0.28.0-py3.6.egg/desitarget/mock/build.py", line 171, in read_mock nside_galaxia=nside_galaxia, mock_density=mock_density) File "/global/common/software/desi/cori/desiconda/20180709-1.2.6-spec/code/desitarget/0.28.0/lib/python3.6/site-packages/desitarget-0.28.0-py3.6.egg/desitarget/mock/mockmaker.py", line 3204, in read only_coords=only_coords, seed=self.seed) File "/global/common/software/desi/cori/desiconda/20180709-1.2.6-spec/code/desitarget/0.28.0/lib/python3.6/site-packages/desitarget-0.28.0-py3.6.egg/desitarget/mock/mockmaker.py", line 2104, in readmock raise IOError OSError

I tried to run this notebook in the desi master environment, but there is no the mpi_select_mock_targets function in the master env.

Could anyone please help me? Thanks a lot!

moustakas commented 4 years ago

Can you please paste more of the output log before the crash? Also, please point me to your input.yaml file.

The 19.2 release is pretty stale at this point so once I've had a look at your input config file I'll suggest you try with the 19.10 release.

andluizsouza commented 4 years ago

Hi, @moustakas. Thanks for your reply. This is the output log before the crash:

srun -A desi -N 2 -n 16 -c 8 -C haswell -t 01:00:00 --qos interactive mpi_select_mock_targets --no-spectra --nproc 4 --nside 32 --seed 10 -c ./input.yaml --output_dir ./ --tiles ./tiles.fits

INFO:mpi_select_mock_targets:83:: 15 tiles INFO:mpi_select_mock_targets:98:: 19/19 pixels remaining to do INFO:mpi_select_mock_targets:146:: rank 0 processes 1 pixels [4493] INFO:mpi_select_mock_targets:146:: rank 8 processes 1 pixels [4504] INFO:mpi_select_mock_targets:146:: rank 9 processes 1 pixels [4505] INFO:mpi_select_mock_targets:146:: rank 10 processes 2 pixels [4506 4507] INFO:mpi_select_mock_targets:146:: rank 11 processes 1 pixels [4508] INFO:mpi_select_mock_targets:146:: rank 12 processes 1 pixels [4509] INFO:mpi_select_mock_targets:146:: rank 13 processes 1 pixels [4510] INFO:mpi_select_mock_targets:146:: rank 14 processes 1 pixels [4517] INFO:mpi_select_mock_targets:146:: rank 15 processes 2 pixels [4528 4529] INFO:mpi_select_mock_targets:146:: rank 1 processes 1 pixels [4495] INFO:mpi_select_mock_targets:146:: rank 3 processes 1 pixels [4498] INFO:mpi_select_mock_targets:146:: rank 5 processes 2 pixels [4500 4501] INFO:mpi_select_mock_targets:146:: rank 7 processes 1 pixels [4503] INFO:mpi_select_mock_targets:146:: rank 2 processes 1 pixels [4497] INFO:mpi_select_mock_targets:146:: rank 4 processes 1 pixels [4499] INFO:mpi_select_mock_targets:146:: rank 6 processes 1 pixels [4502] INFO:mpi_select_mock_targets:157:: Logging pixels [4504] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4504/build-32-4504.log INFO:mpi_select_mock_targets:157:: Logging pixels [4493] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4493/build-32-4493.log INFO:mpi_select_mock_targets:157:: Logging pixels [4505] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4505/build-32-4505.log INFO:mpi_select_mock_targets:157:: Logging pixels [4495] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4495/build-32-4495.log INFO:mpi_select_mock_targets:157:: Logging pixels [4506] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4506/build-32-4506.log INFO:mpi_select_mock_targets:157:: Logging pixels [4497] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4497/build-32-4497.log INFO:mpi_select_mock_targets:157:: Logging pixels [4508] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4508/build-32-4508.log INFO:mpi_select_mock_targets:157:: Logging pixels [4498] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4498/build-32-4498.log INFO:mpi_select_mock_targets:157:: Logging pixels [4509] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4509/build-32-4509.log INFO:mpi_select_mock_targets:157:: Logging pixels [4499] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4499/build-32-4499.log INFO:mpi_select_mock_targets:157:: Logging pixels [4510] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4510/build-32-4510.log INFO:mpi_select_mock_targets:157:: Logging pixels [4500] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4500/build-32-4500.log INFO:mpi_select_mock_targets:157:: Logging pixels [4517] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4517/build-32-4517.log INFO:mpi_select_mock_targets:157:: Logging pixels [4502] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4502/build-32-4502.log INFO:mpi_select_mock_targets:157:: Logging pixels [4528] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4528/build-32-4528.log INFO:mpi_select_mock_targets:157:: Logging pixels [4503] to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4503/build-32-4503.log INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4503/build-32-4503.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4506/build-32-4506.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4493/build-32-4493.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4508/build-32-4508.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4495/build-32-4495.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4509/build-32-4509.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4497/build-32-4497.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4510/build-32-4510.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4498/build-32-4498.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4528/build-32-4528.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4499/build-32-4499.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4504/build-32-4504.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4500/build-32-4500.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4505/build-32-4505.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4502/build-32-4502.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:365:stdouterr_redirected: Begin log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/45/4517/build-32-4517.log at Tue Dec 3 08:18:39 2019 INFO:parallel.py:412:stdouterr_redirected: End log redirection to /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock/44/4497/build-32-4497.log at Tue Dec 3 08:25:06 2019

andluizsouza commented 4 years ago

and my input.yaml file is /global/cscratch1/sd/andsouza/desi/test/fiberassign_mock

andreufont commented 4 years ago

Hi @andluizsouza , we don't have permission to access this file

andreufont commented 4 years ago

Take a look at this: https://desi.lbl.gov/trac/wiki/Computing/NerscFileSystem#Filepermissions In particular the instructions to run fix_permissions.sh

andluizsouza commented 4 years ago

Sorry, @andreufont. Do you have permission now?

andreufont commented 4 years ago

No, not yet. Look at this:

font:ForecastDESI$ ls -lh /global/cscratch1/sd/ | grep andsouza drwx------ 3 andsouza andsouza 4.0K Sep 30 11:40 andsouza

It shows that only you have access to your SRATCH folder. You can allow desi members to read your folder by running fix_permissions to this folder

andluizsouza commented 4 years ago

And now? I am following all the instructions...

andreufont commented 4 years ago

Now yes, thanks

andluizsouza commented 4 years ago

@moustakas, I tried to run this notebook with the desi environment 19.10 version but also it didn't work. I got the same error message.

sbailey commented 4 years ago

FYI I'm debugging and fixing the FiberAssign.ipynb tutorial now. It had problems with fiberassign doing more checks on the exact output columns needed for operations, and the tutorial not including all required columns. It is possible that the mocks also are out-of-date with what columns are required. Debugging was made more difficult by the notebook not including all fiberassign stdout/stderr.

Thanks for your patience; I hope to get these working again by the time of the OSU workshop next week...

sbailey commented 4 years ago

I'm running out of time to test this tonight and NERSC is out tomorrow, but adding a note while it is fresh in my mind:

Nevermind my previous comment; the actual problem is that the tutorial repo has its own select_mock_targets configuration file (input.yaml) which is out of date and specifying a input mock that doesn't exist (/global/project/projectdirs/desi/mocks/lya_forest/develop/london/v4.0/master.fits)

We should update this tutorial to use the desitarget/mock/data/select-mock-targets.yaml which @moustakas maintains.

moustakas commented 4 years ago

Indeed, that's where I was going with this before getting pulled away -- I suspected that input.yaml was out of date.

andluizsouza commented 4 years ago

Thanks, @sbailey and @moustakas. Actually, the configuration file input.yaml created by the fiberassign notebook does not equal to this one.

sbailey commented 4 years ago

Update: by using the desitarget select-mock-targets.yaml file and other minor updates (mtl-dark.fits vs. mtl.fits...) I got further, but now I'm failing on fiberassign due to desihub/fiberassign#243 and desihub/fiberassign#244 . Those are likely related to the NERSC "upgrade" this week, and we'll need them to be fixed before we can get this tutorial working again.

andluizsouza commented 4 years ago

Hi @sbailey and @moustakas! Is there any update about this topic?

I could write another select_mock_targets configuration file (input.yaml) to read a new input mock and to run this notebook. Where can I get a mock for tests?

Thanks!

alxogm commented 4 years ago

Hi @andluizsouza I think you can just copy this configuration file https://github.com/desihub/desitarget/blob/master/py/desitarget/mock/data/select-mock-targets.yaml (as said in an earlier thread...), it points to the London mocks v4.2.0, select_mock_targets works fine with it... To run on a newer mock version, I need to finalize this PR, I hope this will happen today... But for testing the v4.2.0 should be fine...

andluizsouza commented 4 years ago

Thanks, @alxogm! Now it's working well with the London mocks v4.2.0 I added this selection_mock_target file in the tutorial directory

andluizsouza commented 4 years ago

Hi people!

I have faced problems to run the FiberAssignMocks notebook after the last update. I've tried to run it in desi master and desi 19.12 environments, however, but this is not working either.

There is an error in the mpi_select_mock_targets function: ERROR:mpi_select_mock_targets:178:<module>: Pixels [4500] failed after 1.0 minutes ERROR:mpi_select_mock_targets:181:<module>: Traceback (most recent call last): File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/internal/sharedmem.py", line 377, in get return Q.get(timeout=1) File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/conda/lib/python3.6/multiprocessing/queues.py", line 105, in get raise Empty queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/internal/sharedmem.py", line 674, in map capsule = pg.get(R) File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/internal/sharedmem.py", line 380, in get raise StopProcessGroup desitarget.internal.sharedmem.StopProcessGroup: StopProcessGroup

and the join_mock_targets function does not work after, which is not able to make mtl.fits and sky.fits files.

Does anyone have any idea about this problem? Thanks!

forero commented 4 years ago

I tested the notebook against master last Thursday and it worked. I am testing the notebook again and it's not working.

This is new in desitarget https://github.com/desihub/desitarget/commit/90467034bf1bb73f7721b461c2eb6b1a203350df

I don't know if that impactedselect_mock_targets somehow, or we are facing another problem.

forero commented 4 years ago

There is another error message:

Traceback (most recent call last):
  File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/internal/sharedmem.py", line 265, in _slaveMain
    self.main(self, *self.args)
  File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/internal/sharedmem.py", line 563, in _main
    r = realfunc(work)
  File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/internal/sharedmem.py", line 629, in realfunc
    else: return func(i)
  File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/cuts.py", line 1003, in isBGS
    south=south, targtype=targtype)
  File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/cuts.py", line 1119, in isBGS_lslga
    LX = np.array([rc[0] == "L" for rc in refcat], dtype=bool)
  File "/global/common/software/desi/cori/desiconda/20190804-1.3.0-spec/code/desitarget/master/py/desitarget/cuts.py", line 1119, in <listcomp>
    LX = np.array([rc[0] == "L" for rc in refcat], dtype=bool)
IndexError: string index out of range

It looks like it's realted to the refcat introduced in https://github.com/desihub/desitarget/commit/90467034bf1bb73f7721b461c2eb6b1a203350df

michaelJwilson commented 4 years ago

I had problems with the REFCAT lookup that applies here:

https://github.com/desihub/desitarget/blob/88679de62f1511a52919497bc0d9f4258e923cd7/py/desitarget/cuts.py#L1119

I wasn't able to find a satisfactory solution. But adding:

refcat = [x.replace('', '  ') for x in refcat]

#the LSLGA galaxies                                                                                                     
if refcat is None:
  LX = bgs.copy() 

to desitarget cuts and

targets['RA_IVAR'][:], targets['DEC_IVAR'][:] = 1e8, 1e8
targets['DCHISQ'][:] = np.tile( [0.0, 100, 200, 300, 400], (nobj, 1)) # for QSO selection                           

# Assign REF_CAT:  chararray(['', 'G2', 'L2'], dtype='S2')                                                          
targets['REF_CAT'].data[:] = data['REF_CAT'][indx]

to populate_targets_truth in desitarget/mockmaker 

at least made it go away.

forero commented 4 years ago

@michaelJwilson I've opened the issue https://github.com/desihub/desitarget/issues/585. Perhaps you want to submit your fix as a PR over there.

michaelJwilson commented 4 years ago

I posted pull requests on both desitarget and fiberassign. They both run to completion for me.
I doubt Adam will let the desitarget change pass, but illustrates what's needed and might be useful you guys in the meantime.

geordie666 commented 4 years ago

Thanks @michaelJwilson: I think I have a different fix that corrects the try/except clause that is actually failing, instead of modifying the mocks (hits the cause not the symptoms).

I'll try to implement that later today. In the meantime, though, your fix is appreciated and should be useful for anyone that needs to process things in the short-term.

michaelJwilson commented 4 years ago

Sounds great. I tried for quite a while and couldn't find anything I liked. This had the benefit of being an easily removed one-liner that actually runs.

I didn't like having a try/except aimed at the mocks. At the root cause was, I think, was that the mocks are astropy Table based, while desitarget is a np structured array. I couldn't find a way to actually return a b' ' string on the mock side (from a Table). Table['Col'].data is required for bytes rather than strings to be returned, but doesn't help with the whitespace.

weaverba137 commented 1 year ago

I believe FiberAssignMocks was working as of recently. Please open a separate ticket if there are still problems with that notebook.