desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
35 stars 24 forks source link

Replace hard-coded paths with environment variables #2294

Closed weaverba137 closed 1 month ago

weaverba137 commented 1 month ago

This PR fixes #2272.

This PR introduces a new environment variable, DESI_TILES, similar to DESI_SURVEYOPS and also uses DESI_SURVEYOPS more extensively.

DESI_TILES will be defined in desimodules (as a fix to desihub/desimodules#49).

sbailey commented 1 month ago

To emphasize the WIP status as a warning to my future self: this requires updates to desimodules to define $DESI_TILES before it will work, and there are other places that also reference trunk that have not yet been updated (e.g. desispec.io.fibermap.assemble_fibermap). But in the spirit of WIP, the basic structure of get_readonly_filepath(os.environ['DESI_BLATFOO']) shown here looks good.

weaverba137 commented 1 month ago

@sbailey, while working on this, I noticed that fiberassign file names are specified in multiple ways e.g. f'{tileid // 1000:03d} versus stileid = f'{tileid:06d}'; ... stileid[:3] .... Should we define a function instead and only ever use the function to find the fiberassign file?

sbailey commented 1 month ago

@sbailey, while working on this, I noticed that fiberassign file names are specified in multiple ways e.g. f'{tileid // 1000:03d} versus stileid = f'{tileid:06d}'; ... stileid[:3] .... Should we define a function instead and only ever use the function to find the fiberassign file?

Good point. Best would be to add a new filetype entry to desispec.io.findfile, e.g. "fiberassignsvn", which would then do this $DESI_TILE+tileid etc. wrangling to report the canonical name of a fiberassign file in svn. This would augment the pre-existing filetype='fiberassign' which reports where the fiberassign file is in the raw data stream.

weaverba137 commented 1 month ago

OK, and while working on findfile, I found this. This can't be right can it (around line 294)?

    if isinstance(tile, str):    tile = int(night)
sbailey commented 1 month ago

Indeed, that whole block was introduced (by me!) in PR #1931 and is a cut-and-paste typo on multiple lines:

    #- be robust to str vs. int
    if isinstance(healpix, str): healpix = int(healpix)
    if isinstance(night, str):   night = int(night)
    if isinstance(expid, str):   expid = int(night)
    if isinstance(tile, str):    tile = int(night)
    if isinstance(spectrograph, str): spectrogrpah = int(night)

Note the "spectrogrpah" typo as well.

@weaverba137 would you like me to fix these in a separate PR (with tests!) or will you fix it in your current branch?

weaverba137 commented 1 month ago

I will do a minimal fix on this branch, but let's think about a separate PR to do more robust unit tests on findfile.

In addition, I think a future policy for all PRs should be "if you are trying to create a file name, you must use findfile".

weaverba137 commented 1 month ago

I'm moving this out of draft status.

@sbailey, please review this listing of 'trunk' occurrences. I believe these files can be ignored:

  1. etc/cron_dailytest.sh
  2. etc/desi_nersc_pipetest.py
  3. py/desispec/database/redshift.py (the entire database directory is deprecated anyway)
  4. py/desispec/test/test_photo.py
  5. py/desispec/test/integration_test.py

I'm not sure about these, because a URL is being constructed instead of a filename:

  1. py/desispec/scripts/procdashboard.py
  2. py/desispec/scripts/zprocdashboard.py

These files have been updated:

  1. py/desispec/io/photo.py
  2. bin/desi_tile_qa_reference
  3. py/desispec/io/fibermap.py
  4. py/desispec/workflow/exptable.py
  5. py/desispec/scripts/archive_night.py
akremin commented 1 month ago

The procdashboards are used frequently and so a refactor is welcome, but I also think it is okay to leave as-is since it is a special case where it constructs a url. If you want, you are welcome to update the logic so that it uses findfile for the relative path and then creates the url based on that. But I'm also fine with leaving it. These files aren't moving anytime soon, so keeping the dashboard code as-is is very low risk.

On Wed, Jul 17, 2024, 11:02 PM Benjamin Alan Weaver < @.***> wrote:

I'm moving this out of draft status.

@sbailey https://github.com/sbailey, please review this listing of 'trunk' occurrences https://github.com/search?q=repo%3Adesihub%2Fdesispec+trunk+NOT+trunk_py+NOT+language%3A%22Jupyter+Notebook%22+NOT+path%3Adoc&type=code. I believe these files can be ignored:

  1. etc/cron_dailytest.sh
  2. etc/desi_nersc_pipetest.py
  3. py/desispec/database/redshift.py (the entire database directory is deprecated anyway)
  4. py/desispec/test/test_photo.py
  5. py/desispec/test/integration_test.py

I'm not sure about these, because a URL is being constructed instead of a filename:

  1. py/desispec/scripts/procdashboard.py
  2. py/desispec/scripts/zprocdashboard.py

These files have been updated:

  1. py/desispec/io/photo.py
  2. bin/desi_tile_qa_reference
  3. py/desispec/io/fibermap.py
  4. py/desispec/workflow/exptable.py
  5. py/desispec/scripts/archive_night.py

— Reply to this email directly, view it on GitHub https://github.com/desihub/desispec/pull/2294#issuecomment-2234280565, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBM5O3O4F43RPC3ZGOVJETZM3LXPAVCNFSM6AAAAABK7HYDRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZUGI4DANJWGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

weaverba137 commented 1 month ago

I can't add comments directly to this line item:

Thanks for noting that. Seeing that you've added a tiles_dir option to findfile, this code could intercept missing $DESI_TILES and fall back to doing what it would have done before, i.e. using tiles_dir=$DESI_TARGET/fiberassign/tiles/trunk instead (while still printing a warning).

I'm not entirely sure what you're recommending here. Maybe you could express this as a code snippet?

However, I now realize that the way findfile is modified in this branch, an exception will be thrown if FIBER_ASSIGN_DIR is not defined anyway, and so maybe the solution is to catch a potential KeyError thrown by findfile, add a helpful message, and re-raise.

sbailey commented 1 month ago

I'm not entirely sure what you're recommending here. Maybe you could express this as a code snippet?

Thinking about it more, let's just go all-in on making $FIBER_ASSIGN_DIR a required variable for standard operations, and not try to support an interim period where it might not be defined, since desimodules/main already defines it.

Heads up @sybenzvi, when you update desispec at KPNO, you'll also need to make sure that module files / config gets set to define FIBER_ASSIGN_DIR=$DESI_TARGET/fiberassign/tiles/trunk .

However, I now realize that the way findfile is modified in this branch, an exception will be thrown if FIBER_ASSIGN_DIR is not defined anyway, and so maybe the solution is to catch a potential KeyError thrown by findfile, add a helpful message, and re-raise.

We're ok here. You added

if tiles_dir is None and 'tiles_dir' in required_inputs:
        tiles_dir = os.environ['FIBER_ASSIGN_DIR']

which means that $FIBER_ASSIGN_DIR is only required if tiles_dir is not specified, which is good. We'll require $FIBER_ASSIGN_DIR for normal operations, but if for whatever reason someone is running without it, findfile('fiberassignsvn', tiles_dir=...) will work without requiring an envvar that it doesn't use (I explicitly checked that).

Summarizing, I think we want the assemble_fibermap block to be

    if allow_svn_override:
        testfile, svn_exists = findfile('fiberassignsvn', tile=tileid, readonly=True, return_exists=True)
        if svn_exists:
            fafile = testfile
            log.info(f'Overriding raw fiberassign file {rawfafile} with svn {fafile}')
        else:
            log.info(f'{testfile}[.gz] not found; sticking with raw data fiberassign file')

i.e. findfile will require $FIBER_ASSIGN_DIR to exist and raise a KeyError if it doesn't, and this code handles the case where findfile works but the tile hasn't yet been committed to svn (svn_exists=False).

weaverba137 commented 1 month ago

@sbailey, I've made the requested changes, except for the open question about io.photo above.

sbailey commented 1 month ago

For the record: I have installed this updated version at NERSC