desihub / desisurvey

Code for desi survey planning and implementation
BSD 3-Clause "New" or "Revised" License
2 stars 7 forks source link

Assertion failure in Progress.get_exposures() #69

Closed weaverba137 closed 7 years ago

weaverba137 commented 7 years ago

I'm starting to modify the minitest notebook in two_percent_DESI to use the recent changes to Progress.get_exposures() to eventually create an exposures.fits file that can be loaded into a database. It is called like this:

from desisurvey.progress import Progress
p = Progress(restore='progress.fits')
explist = p.get_exposures()
explist.write(os.path.join(surveydir, 'exposures.fits'), overwrite=True)

With the modified Progress.get_exposures(), I get an AssertionError:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-32-1af1c9dc1e70> in <module>()
      1 from desisurvey.progress import Progress
      2 p = Progress(restore='progress.fits')
----> 3 explist = p.get_exposures()
      4 explist.write(os.path.join(surveydir, 'exposures.fits'), overwrite=True)

~/software/cori/code/desisurvey/my-master/py/desisurvey/progress.py in get_exposures(self, start, stop, tile_fields, exp_fields)
    560                         desimodel.io.load_tiles(onlydesi=True, extra=False,
    561                         tilesfile=config.tiles_file()))
--> 562                     assert np.all(tileinfo['TILEID'] == table['tileid'])
    563                 output[name.upper()] = tileinfo['EBV_MED'][tile_index]
    564             else:

AssertionError: 

I'm going to guess that the set of tiles in the Progress object is not the same as the set of tiles returned by desimodel.io.load_tiles().

Note also that the loading is somewhat odd (note the comments):

        tileinfo = None  # tileinfo is None by definition!
        output = astropy.table.Table()
        for name in tile_fields.split(','):
            name = name.lower()
            if name == 'index':
                output[name.upper()] = tile_index
            elif name == 'ebmv':
                if tileinfo is None:  # this will never be false, so why bother with it?
                    config = desisurvey.config.Configuration()
                    tileinfo = astropy.table.Table(
                        desimodel.io.load_tiles(onlydesi=True, extra=False,
                        tilesfile=config.tiles_file()))
                    assert np.all(tileinfo['TILEID'] == table['tileid'])
                output[name.upper()] = tileinfo['EBV_MED'][tile_index]

It almost looks like tileinfo could be overridden, but that overriding tileinfo has never been implemented. Also tileinfo is only ever used to load EBMV values.

weaverba137 commented 7 years ago

Also, I noticed while playing around with the notebook that a file called tiles-subset.fits appeared in my home directory (the notebook had been previously copied to my home directory). Is this the tiles file that should be used by Progress.get_exposures()? If so, why is it not written to the same area as progress.fits?

dkirkby commented 7 years ago

This sounds related to changes @sbailey made to desisurvey.config

weaverba137 commented 7 years ago

The tiles-subset.fits file is in fact written by the minitest notebook. I'm going to try a test on the basis that the tiles-subset.fits file is what should be the tileinfo variable defined above.

weaverba137 commented 7 years ago

If I explicitly load the desisurvey configuration file with the desisurvey-config.yaml file in the two_percent_DESI minitest directory, then Progress.get_exposures() runs successfully.

However, a subsequent sanity check in the minitest notebook then fails:

assert np.all(np.in1d(tiles_subset['TILEID'], explist['TILEID']))

Reassigning to @sbailey, because I don't know what the intended behavior is here.

weaverba137 commented 7 years ago

So, I figured out why Progress.get_exposures() has to read the tiles file. Because Progress only reads the tiles file if it is initialized without progress='progress.fits'. If it is initialized with progress='progress.fits', then it has to re-open the tiles file only to get EBMV values. That's silly. The Progress object should just include EBMV so it always has it.

sbailey commented 7 years ago

Looking at this now. I may need to pull in @dkirkby to understand how this is supposed to work within the design of the Progress object, Configuration singletons, etc. But in the meantime, I think this is the workaround that @weaverba137 already mentioned:

import desisurvey.config
from desisurvey.progress import Progress

desisurvey.config.Configuration(file_name=minitestdir+'/desisurvey-config.yaml')
p = Progress(restore='progress.fits')
explist = p.get_exposures()
explist.write(os.path.join(surveydir, 'exposures.fits'), overwrite=True)

desisurvey is using a Configuration object singleton as a form of global variable, including which tile file to use if overriding desimodel. You have to load the Configuration prior to creating the Progress object so that it knows to use that.

Note: minitest is a variable defined in the notebook in the latest version merged into master. It may not have been in the version of the notebook you grabbed, but it basically points to the directory that contains the notebook, i.e. os.getcwd() at the start of running.

I think the longer term solution is to add a constructor option to the Progress object with the config file to read if restore=None. Including EBMV in the Progress object to avoid re-reading the tiles file would be bonus, but it seems that the constructor would still need to read a configuration file to get config.min_snr2_fraction(). i.e. the progress.fits file does not appear to be self contained for restoring a Progress object. I'll chat with @dkirkby.

dkirkby commented 7 years ago

A Progress FITS file is intended to be self-contained so I probably need to write min_snr2_fraction into its header. The EBMV column is intentionally not part of the Progress table (which I tried to keep minimal) but get_exposures can splice it in as a convenience (thus requiring access to the tiles file). I am ok with adding the EBMV column to Progress if that's the simplest solution, now that we support alternate tile files. In any case, I won't get a chance to work on this until next week.

weaverba137 commented 7 years ago

In the latest test on two_percent_DESI, branch automini, and using desisurvey branch pass2program, I got a new type of error:

./test-tiles.fits
INFO:progress.py:133:__init__: Loaded progress from /global/cscratch1/sd/bweaver/desi/dev/end2end/survey/progress.fits.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-2d2835b59e3d> in <module>()
      8     config = Configuration(os.path.join(minitestdir, 'desisurvey-config.yaml'))
      9     print(config.tiles_file())
---> 10     p = Progress(restore='progress.fits')
     11     explist = p.get_exposures()
     12 

/global/common/cori/contrib/desi/desiconda/20170613-1.1.4-spectro/code/desisurvey/0.9.2/lib/python3.5/site-packages/desisurvey-0.9.2-py3.5.egg/desisurvey/progress.py in __init__(self, restore, max_exposures)
    136                 raise RuntimeError(
    137                     'Progress table has incompatible version {0}.'
--> 138                     .format(table.meta['VERSION']))
    139             # Check that the status column matches the current min_snr2.
    140             snr2sum = table['snr2frac'].data.sum(axis=1)

RuntimeError: Progress table has incompatible version 4.

Here's the whole cell:

if os.path.exists(expfile):
    print('Reading explist from {}'.format(expfile))
    explist = Table.read(expfile)
else:
    from desisurvey.config import Configuration
    from desisurvey.progress import Progress
    Configuration.reset()
    config = Configuration(os.path.join(minitestdir, 'desisurvey-config.yaml'))
    print(config.tiles_file())
    p = Progress(restore='progress.fits')
    explist = p.get_exposures()

    # Sanity check that all tiles in the subset were observed in the exposures list
    if not np.all(np.in1d(tiles['TILEID'], explist['tileid'])):
        print("ERROR: some tiles weren't observed;\ncheck {} for failures".format(survey_logname) )
        print("Missing TILEIDs:", set(tiles['TILEID']) - set(explist['tileid']))
    else:
        print("All tiles in the subset were observed at least once")
        explist.write(expfile, overwrite=True)
        print('Wrote {}'.format(expfile))
weaverba137 commented 7 years ago

PS, note this line:

/global/common/cori/contrib/desi/desiconda/20170613-1.1.4-spectro/code/desisurvey/0.9.2/lib/python3.5/site-packages/desisurvey-0.9.2-py3.5.egg/desisurvey/progress.py in __init__(self, restore, max_exposures)

That should be my checkout git checkout of desisurvey, not the installed version! I checked very, very carefully to ensure that only my version of desisurvey was in sys.path, but it's still turning up.

weaverba137 commented 7 years ago

Ah, OK, the very first cell imports desisurvey, before I switch to the git checkout version. So I have to modify sys.path before anything else.

weaverba137 commented 7 years ago

Once I made sure that I was starting from a completely clean end2end directory, and made sure that the sys.path was set correctly before any DESI packages were imported, then I'm able to get past the progress to exposures stage successfully. Continuing on with the rest of the notebook now.

weaverba137 commented 7 years ago

I've run completely through the notebook with no problems. The solution is:

  1. Start from a completely clean output directory.
  2. Make sure sys.path is set correctly before importing any DESI packages.