desihub / desitarget

DESI Targeting
BSD 3-Clause "New" or "Revised" License
18 stars 23 forks source link

reproducible order of input files #738

Closed sbailey closed 3 years ago

sbailey commented 3 years ago

@geordie666 this fixes a bug originally found by @araichoor : select_targets was getting its input files in a random order, which resulted in the output targets having a random order. This was problematic because the SUBPRIORITY assigned to the targets was based upon their order in the file, thus breaking the TARGETID:SUBPRIORITY mapping from one run to the next.

The update that actually fixed the problem was changing

infiles = list(set(np.hstack([filesperpixel[pix] for pix in pixlist])))

to

infiles = list(np.unique(np.hstack([filesperpixel[pix] for pix in pixlist])))

i.e. set returns a random order of unique items, while np.unique returns sorted unique items. I'm not sure if leaving the list(...) cast was necessary, but I was trying to keep the update as minimal as possible.

Along the way, I also wrapped sorted(glob(...)) everywhere glob appeared, but that wasn't the core issue here and I didn't undo those.

Reproducer:

select_targets /global/cfs/cdirs/cosmo/data/legacysurvey/dr9/north/sweep/9.0 \
    $CSCRATCH/desi/targets/reproducibility/test-9142-1 \
    -s2 /global/cfs/cdirs/cosmo/data/legacysurvey/dr9/south/sweep/9.0 \
    --nside 64 --healpixels 9142 --numproc 16 --nosecondary \
    --gaiasub &> test-9142-1.log

select_targets /global/cfs/cdirs/cosmo/data/legacysurvey/dr9/north/sweep/9.0 \
    $CSCRATCH/desi/targets/reproducibility/test-9142-2 \
    -s2 /global/cfs/cdirs/cosmo/data/legacysurvey/dr9/south/sweep/9.0 \
    --nside 64 --healpixels 9142 --numproc 16 --nosecondary \
    --gaiasub &> test-9142-2.log

import fitsio                                                                               
a = fitsio.read('test-9142-1/dr9/1.0.1.dev5024/targets/main/resolve/bright/targets-bright-hp-9142.fits')                                                                                
b = fitsio.read('test-9142-2/dr9/1.0.1.dev5024/targets/main/resolve/bright/targets-bright-hp-9142.fits')                                                                                

np.all(a['TARGETID'] == b['TARGETID'])  #- Previously False, now True
np.all(np.sort(a['TARGETID']) == np.sort(b['TARGETID'])) #- True
araichoor commented 3 years ago

thanks for that. one comment: do we care that it s not "backward-compatible", ie that running that code e.g. won t reproduce the 1.0.0 catalogs? I think we don t care (ie if we want to reproduce the 1.0.0 catalogs, let s just use the 1.0.0 tag), but I m just raising the point, in case.

geordie666 commented 3 years ago

I think it's fine for this not to be backwards compatible. As Anand says, one can always use a tag for that purpose.

@sbailey: Sorry the glob()s were a wild goose chase and that this was an errant set(). I think it's fine to merge this if it's what you need to make progress. I've checked that I don't use this set() construction when making the skies, which is the other main case where SUBPRIORITY crops up. Thanks for fixing this!

coveralls commented 3 years ago

Coverage Status

Coverage remained the same at 59.072% when pulling aa4484c8c512f5c8d47bdbcda6ee3e5d2ee36c73 on reproducibility into 6dc166ce8d548793e76e334053610edc3cca4d57 on master.

sbailey commented 3 years ago

Thanks for the quick reviews. Regarding backwards compatibility: we've already "broken" that in master by making the random seeds based on healpix (good) instead of the same for every healpix. But I think that is ok — we want fiberassign to be backwards compatible for replaying mocks, but for desitarget the most important thing is be be reproducible for how a particular version creates its outputs.

I will merge now so that I can rebase the subpriority branch for further testing.

araichoor commented 3 years ago

one last sanity-check comment (sorry, after the merging): @sbailey did you verify that this change of adding np.unique() in the infiles definition in cuts.py only was sufficient to get it right for:

araichoor commented 3 years ago

for secondary, it should be fine, as there is no healpix pixels at play, it s a loop over the individual target masks (which all have an associated fits file) : https://github.com/desihub/desitarget/blob/f2c6d4d2f4def25f6c79e1dfe235839640e73d9b/py/desitarget/secondary.py#L294-L295

@geordie666 : extra minor comment: while checking the code, I notice that there are some "hppixlist" in write_secondary() docstring and function, though not present in the argument list; likely some copy-pasting leftover from write_targets(): I guess the code doesn t crash on that "if hppixlist is not None", because the if conditions just above are not met when you generate the catalogs... https://github.com/desihub/desitarget/blob/f2c6d4d2f4def25f6c79e1dfe235839640e73d9b/py/desitarget/io.py#L906-L938

araichoor commented 3 years ago

I just verified that the backup (=gaia) targets do not use glob(); the file searching is based on identifying healpix pixels, so perfect.

geordie666 commented 3 years ago

for secondary, it should be fine, as there is no healpix pixels at play, it s a loop over the individual target masks (which all have an associated fits file) :

https://github.com/desihub/desitarget/blob/f2c6d4d2f4def25f6c79e1dfe235839640e73d9b/py/desitarget/secondary.py#L294-L295

@geordie666 : extra minor comment: while checking the code, I notice that there are some "hppixlist" in write_secondary() docstring and function, though not present in the argument list; likely some copy-pasting leftover from write_targets(): I guess the code doesn t crash on that "if hppixlist is not None", because the if conditions just above are not met when you generate the catalogs...

https://github.com/desihub/desitarget/blob/f2c6d4d2f4def25f6c79e1dfe235839640e73d9b/py/desitarget/io.py#L906-L938

This has been fixed in #739. Thanks!