desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
33 stars 24 forks source link

tiles version hardcoded to trunk #2272

Open weaverba137 opened 1 month ago

weaverba137 commented 1 month ago

The version of the tiles product is hardcoded to trunk in desispec.io.fibermap.assemble_fibermap() L595, and possibly in other places. This means there is no possible way to specify a tag of the tiles product when creating a specprod for release, e.g. jura.

The tag is important for creating a fixed set of fiberassign files that will be:

  1. Included in a public release
  2. Themselves loaded into the specprod database
  3. The definition of the observed and potential targets for the photometry catalog associated with a specprod, which will also be loaded into the specprod database

We had a bit of good luck though: we can easily reconstruct the revision of trunk used for jura and create a tag based on that (which will be 2.0). If changes to the fiberassign files are needed for kilimanjaro, that could be tag 2.1 (but hopefully that won't be necessary).

We still need to establish whether the line above is the only place where the tiles version is hardcoded. I will attempt to do a very large grep, since this problem may not be limited to desispec.

For future runs, what is the best way to pass information on the tiles tag to the pipeline? One possibility is to associate a "tiles" modulefile that points to the existing tiles checkout (trunk or a tag) on CFS, and then pass that information via an environment variable. Or the environment variable could be set in another modulefile like desimodules (see also desihub/desimodules#49). If the environment variable is not set, that should be considered an error, rather than falling back on a hardcoded value.

@sbailey @araichoor @akremin @geordie666 et al. please take note. Tag others as needed.

weaverba137 commented 1 month ago

First results from grep -r -i fiberassign/tiles/ * | grep -v TILES_VERSION in a checkout that has many but not necessarily all of the products defined in desimodules (with notebooks manually excluded):

desispec

./desispec/py/desispec/io/fibermap.py:        testfile = f'{targdir}/fiberassign/tiles/trunk/{tileid//1000:03d}/fiberassign-{tileid:06d}.fits'
./desispec/py/desispec/scripts/procdashboard.py:            linkloc = f"https://data.desi.lbl.gov/desi/target/fiberassign/tiles/" \
./desispec/py/desispec/scripts/zprocdashboard.py:        linkloc = f"https://data.desi.lbl.gov/desi/target/fiberassign/tiles/" \

fastspecfit

./fastspecfit/py/fastspecfit/io.py:            defaults to `$DESI_ROOT/target/fiberassign/tiles/trunk`.

desisurvey

./desisurvey/py/desisurvey/data/config-cmx.yaml:fiber_assign_dir: /global/cfs/cdirs/desi/target/fiberassign/tiles/trunk/
./desisurvey/bin/fba-main-onthefly.sh:    SVNTILEDIR=$DESI_TARGET/fiberassign/tiles/trunk

fiberassign

./fiberassign/fba_patch_202110:def patch(infn = 'desi/target/fiberassign/tiles/trunk/001/fiberassign-001000.fits.gz',
./fiberassign/fba_patch_202110:    parser.add_argument('--in-base', default='/global/cfs/cdirs/desi/target/fiberassign/tiles/trunk',
./fiberassign/py/fiberassign/fba_launch_io.py:    svn_trunk = os.path.join(os.getenv("DESI_TARGET"), "fiberassign/tiles/trunk")
./fiberassign/py/fiberassign/test/collide.py:    fbah = fitsio.read_header('/global/cfs/cdirs/desi/target/fiberassign/tiles/trunk/'+ts[:3]+'/fiberassign-'+ts+'.fits.gz')
./fiberassign/py/fiberassign/fba_tertiary_io.py:        If fadir is provided the code will first look in fadir, then will look in $DESI_TARGET/fiberassign/tiles/trunk.
./fiberassign/py/fiberassign/fba_tertiary_io.py:        If fadir is not provided, the code will look in $DESI_TARGET/fiberassign/tiles/trunk.
./fiberassign/bin/fba_patch:        help="source folder for input fiberassign files (default=$DESI_TARGET/fiberassign/tiles/trunk)",
./fiberassign/bin/fba_sv1:        os.getenv("DESI_TARGET"), "fiberassign/tiles/trunk"
./fiberassign/bin/fba_cmx:            os.getenv("DESI_TARGET"), "fiberassign/tiles/trunk"
./fiberassign/bin/fba_patch_diagnosis:        help="source folder for input fiberassign files (default=$DESI_TARGET/fiberassign/tiles/trunk)",
araichoor commented 1 month ago

thanks @weaverba137 for noticing that. if relevant: the syntax in the scripts might has "different flavors", with e.g. os.path.join()... for instance here: https://github.com/desihub/desispec/blob/884969a3298370f97a8409605571a6b623e99977/py/desispec/workflow/exptable.py#L701-L703

here s what I get with looking for trunk in desispec from the browser-based github search engine: https://github.com/search?q=repo%3Adesihub%2Fdesispec%20trunk&type=code

weaverba137 commented 1 month ago

@araichoor, ouch, thank you for pointing that out. I'll try a variety of searches.

weaverba137 commented 1 month ago

PS, it sure looks like we might want to expand this to include the surveyops product.

weaverba137 commented 1 month ago

And we can extend the search to all of desihub(!): https://github.com/search?q=org%3Adesihub+trunk&type=code

araichoor commented 1 month ago

yes, correct; though it s tough to parse... there are 286 files...

I see one can request to restrict to python code (clicking on the left panel options in your search): https://github.com/search?q=org%3Adesihub+trunk+language%3APython+&type=code that returns 130 files (hopefully the 286-130=156 other files mostly are jupyter notebooks? I see 4 c++ files).

and if one requests both words trunk and fiberassign, with python code, that trims it down to 76 files: https://github.com/search?q=org%3Adesihub+trunk+fiberassign+language%3APython&type=code&l=Python

(I m not familiar with those searches, I ve only used it for a simple one-word search in the past)

weaverba137 commented 1 month ago

I am already working on refining that.

weaverba137 commented 1 month ago

This extremely long query gets us down to <60 files that we should really take a look at because they are broadly "in the pipeline". At this point it will be useful to break that back down into individual repositories. Stay tuned.

weaverba137 commented 1 month ago

Using this type of query: repo:desihub/REPO trunk NOT trunk_py NOT language:"Jupyter Notebook" NOT path:doc

These need further investigation:

These are marginal. "trunk" may be used in a few places in a non-ideal way, but probably won't affect the pipeline:

Not worth further investigation after visual inspection (but you're still welcome to look):

araichoor commented 1 month ago

thanks @weaverba137!

for what is worth:

weaverba137 commented 1 month ago

After discussion with @sbailey, we will:

  1. Create a new environment variable, e.g. DESI_TILES, defined in desimodules.
  2. Use DESI_TILES and DESI_SURVEYOPS wherever these had been previously hard-coded.
  3. Raise an error if these are not defined.
  4. Wait until Jura is complete and resolve this simultaneously with desihub/desimodules#49.

Is DESI_TILES acceptable, or is another name preferred?

araichoor commented 1 month ago

just checking: I m wondering about any change that would affect operations at kpno (sorry that s not clear to me). e.g., for fiberassign, we re running from a tagged version (5.6.0), so it should be agnostic to that (https://github.com/desihub/desisurvey/blob/main/bin/fba-main-onthefly.sh).

but I don t know for other modules; maybe it s ok, but I just raise the point to be sure.

weaverba137 commented 1 month ago

Right, I should be clear here: I'm only proposing to change the code in desispec, at least for now. If other maintainers want to proactively start using DESI_TILES and DESI_SURVEYOPS, please do so. Also note that DESI_SURVEYOPS has existed for a while, so there's no reason not to use it.

araichoor commented 1 month ago

ok perfect, thanks!