Fuji: Inconsistencies between top-level exposures and tiles files

weaverba137 commented 2 years ago

While working with the f5 test run, I was comparing these two top-level files:

$DESI_SPECTRO_REDUX/f5/exposures-f5.fits
$DESI_SPECTRO_REDUX/f5/tiles-f5.csv

I found the following inconsistencies:

Tiles 80605, 80606, 80608, 80609, 80610 have SURVEY = 'sv1' in tiles-f5.csv, but SURVEY = 'cmx' in exposures-f5.fits.
Tiles 80865, 80869 have EFFTIME_GFA = (0, 0) (respectively) in tiles-f5.csv, but EFFTIME_GFA = (14890.1, 10798.3) (respectively) in exposures-f5.fits.
Many tiles have non-zero GOALTIME in tiles-f5.csv, but zero GOALTIME in exposures-f5.fits.

sbailey commented 2 years ago

Tiles 80605, 80606, 80608, 80609, 80610 are cases of faflavor=cmxelg or cmxlrgqso which despite the name are actually sv1 tiles. i.e. the tiles file is correct.

I'm not sure about what causes the EFFTIME_GFA and GOALTIME discrepancies, but presumably the non-zero values are more correct (easy to say; finding it in the code is the harder part...)

weaverba137 commented 2 years ago

Good to know. Do we know where the discrepancy in SURVEY arises, or will that be as hard as EFFTIME_GFA or GOALTIME?

sbailey commented 2 years ago

as hard (somewhere in desi_tsnr_afterburner, which organically outgrew its original intent and resulted in a lot of if/then/else logic to work around our evolving understanding of what metadata we needed in fiberassign header keywords)

araichoor commented 2 years ago

About the GFA_EFFTIME, if this helps:

I notice that those two tiles have several EXPIDs, but some have EFFTIME_GFA=0, because they are not in the GFA file (see below)
I see those lines in what I think is writing the tiles-f5.fits file, where EFFTIME_ETC and EFFTIME_GFA are set to zero if one of the exposures has a zero value: https://github.com/desihub/desispec/blob/26c627ca6bb1c075bb4edc8cacd3d87b6c1f8a26/py/desispec/tilecompleteness.py#L109-L110
```
            if k == "EFFTIME_ETC" or k == "EFFTIME_GFA" :
                if np.any((exposure_table[k][jj]==0)&(exposure_table["EFFTIME_SPEC"][jj]>0)) : res[k][i]=0 # because we are missing data
```

Example for TILEID=80865, where EXPID=87489 is not in the GFA file:

>>> tileid = 80865
>>> d = Table.read("/global/cfs/cdirs/desi/spectro/redux/f5/exposures-f5.fits", "EXPOSURES")
>>> d[d["TILEID"] == tileid]["NIGHT","EXPID","TILEID","EFFTIME_SPEC","EFFTIME_GFA"]
<Table length=9>
 NIGHT   EXPID TILEID    EFFTIME_SPEC       EFFTIME_GFA    
 int32   int32 int32       float64            float64      
-------- ----- ------ ------------------ ------------------
20210506 87489  80865    2637.0302734375                0.0
20210506 87490  80865   2700.46826171875 2663.6365578261666
20210506 87491  80865  2775.638427734375 2571.7441548934726
20210506 87495  80865        2735.796875 2525.8956065093726
20210507 87615  80865    2349.6240234375   2128.59765950735
20210507 87616  80865   2529.64306640625 2129.6798466769446
20210507 87617  80865  792.6135864257812  680.9170363425495
20210509 87835  80865 1292.2130126953125 1099.2446772898227
20210509 87836  80865  1132.289306640625  1090.425575819361
>>> gfa = Table.read("/global/cfs/cdirs/desi/survey/GFA/offline_matched_coadd_ccds_SV3-thru_20220201.fits", 2)
>>> for expid in d[d["TILEID"] == tileid]["EXPID"]:
...     print(expid, expid in gfa["EXPID"])
... 
87489 False
87490 True
87491 True
87495 True
87615 True
87616 True
87617 True
87835 True
87836 True

araichoor commented 2 years ago

About the GOALTIME, if it helps: if I correctly read the code:

For the exposures: This update_targ_info() function https://github.com/desihub/desispec/blob/26c627ca6bb1c075bb4edc8cacd3d87b6c1f8a26/bin/desi_tsnr_afterburner#L95-L157 tries to guess the following entries from the cframe FIBERMAP header: SURVEY, GOALTYPE, FAPRGRM, FAFLAVOR, MINTFRAC, and GOALTIME. If GOALTIME is not present in the header, a zero value is set.

For the tiles: If missing, the GOALTIME is set here, I think: https://github.com/desihub/desispec/blob/26c627ca6bb1c075bb4edc8cacd3d87b6c1f8a26/py/desispec/tilecompleteness.py#L62-L101 Defaulting to 1000s, and some specific values for SV1 tiles, according to the identified program.

araichoor commented 2 years ago

ps:

I see this GOALTIME discrepancy for 62 tiles, which are all reported as "sv1" or "cmx". Out of the 5 "cmx" ones, four actually are sv1: 80605, 80606, 80608, 80609, 80610 (mentioned in the first message of this issue); the last one 80615 is for "cmxm33" is a real cmx tile (i.e. targeting from CMX catalogs).

All those issues come from the early SV1 tiles, where all the final keywords in the fiberassign files were not set yet.

weaverba137 commented 2 years ago

This may be a topic for another ticket, but why are these top-level summary files plain-vanilla .csv files (except for exposures-f5.fits)? Do people need to open these in Excel?

araichoor commented 2 years ago

My 2 cents: a text format like csv makes it easy to explore with some bash commands like grep.

weaverba137 commented 2 years ago

Sure, but .ecsv is just as easy to explore with grep, and has support for types, units & metadata.

sbailey commented 2 years ago

More inconsistencies: in exposures-everest.csv, tile 80605 is listed as SURVEY=sv1 on 20201221 and 20201222, but SURVEY=cmx on other nights. 20201221 and 20201222 weren't included in f5, and we're having some memory / I/O delays generating prototype files for fuji, so I'm not sure what the current situation is.

I could use help debugging this, since correct survey/program are necessary inputs for healpix redshifts to get the proper grouping. i.e. we know what the right answer is for tile 80605 (sv1) but we need the tsnr_afterburner code to actually produce the right answer so that healpix redshifts can use it as input...

sbailey commented 2 years ago

Summarizing inconsistencies and fixes:

SURVEY: fixed in PR #1675
GOALTIME: fixed in PR #1694
EFFTIME_GFA: this is purposeful -- if any of the input exposures have EFFTIME_GFA=0, i.e. missing data, then the output tiles EFFTIME_GFA is also set to 0, i.e. missing data instead of summing the remaining values and thus obscuring the fact that the inputs have missing data.

One more note: the .csv version of the files purposefully has fewer digits of precision than the fits files to reduce clutter when grepping. See https://github.com/desihub/desispec/blob/bc840bb3bef7c7882dbae087088b983d2a4e0a58/bin/desi_tsnr_afterburner#L603-L611 for which columns are rounded by how much.

I believe that addresses all known inconsistencies, but please re-open if final checks on Fuji/Guadalupe tables reveal more problems.

weaverba137 commented 2 years ago

Are the top-level exposures and tiles files available now, or do those still need to be (re)generated for fuji & guadalupe?

sbailey commented 2 years ago

They still need to be regenerated, which can take a few hours depending upon CFS I/O load.

weaverba137 commented 2 years ago

Can you give me some idea of how urgent this is? And I have no idea how these files are actually used by the pipeline. Does every file need to be readable at all times, or can they be converted on the fly, in a random access mode?

akremin commented 2 years ago

The top level exposures-fuji.*, tiles-fuji.*, exposures-guadalupe.*, and tiles-guadalupe.* have all been regenerated.

sbailey commented 2 years ago

Thanks @akremin

@weaver and others, for context: the exposures-$SPECPROD.[csv,fits] and tiles-$SPECPROD.[csv,fits] are end-product summary files of the pipeline results. We regenerate them when we have generated new files (e.g. because we found something was missing and re-ran it, or every day at 10am for the daily pipeline to ingest the previous night's exposures/tiles). Several of us are also using them as inputs for one-off QA scripts checking the completeness of fuji/guadalupe. I think their only input use for the pipeline is for the healpix redshifts, which uses them to group tiles by SURVEY and PROGRAM.

Regenerating them tonight was very helpful for the final healpix cleanup, thanks. The internal consistency checks of exposures/tiles are an important part of the final fuji release, but they can wait until tomorrow (Friday). @weaverba137 if you can't re-run your consistency check scripts Friday morning, please let us know.

weaverba137 commented 2 years ago

@sbailey, I'll do that right now.

weaverba137 commented 2 years ago

When comparing the FRAMES HDU to the EXPOSURES HDU in exposures-fuji.fits, what is the expectation for GOALTIME in that case? That is, this is not the comparison of GOALTIME between exposures and tiles.

weaverba137 commented 2 years ago

Are RA, Dec in exposures and tiles files rounded, even in the FITS versions?

sbailey commented 2 years ago

When comparing the FRAMES HDU to the EXPOSURES HDU in exposures-fuji.fits, what is the expectation for GOALTIME in that case? That is, this is not the comparison of GOALTIME between exposures and tiles.

Good question, they should match. As an oversight, I did not try to fix anything in the FRAMES HDU, so I would not be surprised if there are lingering inconsistencies there, especially for SURVEY and GOALTIME, and maybe for MINTFRAC. Please remove any mismatches you find.

For the TSNR2 values, the EXPOSURES are something like the be the sum over the cameras and then averaging (or median?) over the petals, so we'd have to dig into the code to find the exact relationship. For this consistency check I think you can skip those.

weaverba137 commented 2 years ago

@sbailey, I'm not entirely sure what you're asking me here. Do you actually want me to modify the exposures-fuji.fits file FRAMES HDU?

weaverba137 commented 2 years ago

Not sure why this ticket was closed, when we're still actively working on it.

weaverba137 commented 2 years ago

Another question: in the FRAMES HDU, EBV appears to be defined on a frame-by-frame basis. How is the exposures EBV computed?

weaverba137 commented 2 years ago

Summary of findings so far (so I can go get lunch before Friday noon talks):

The FITS and CSV versions of both exposures and tiles are self-consistent.
PROGRAM is consistent with faflavor2program(faflavor) for both exposures and tiles.
The values in the exposures and tiles files are all consistent for all relevant columns
- Exception: tile 80873, LRG_EFFTIME_DARK = 2643.1, sum(exposures[LRG_EFFTIME_DARK]) = 2643.2
- Exception: tile 248, BGS_EFFTIME_BRIGHT = 1230.7, sum(exposures[BGS_EFFTIME_BRIGHT]) = 1230.6
- Exception: tile 80862, BGS_EFFTIME_BRIGHT = 7748.5, sum(exposures[BGS_EFFTIME_BRIGHT]) = 7748.4
- Given that everything else agrees almost perfectly, I'm a little surprised by this. Perhaps a strict sum isn't actually how these values are computed?
Need more information about how the EBV and EXPTIME values are computed given the corresponding values in the FRAMES HDU.
Some FRAMES have GOALTIME = 0 when the corresponding exposure has GOALTIME != 0.

I'm very surprised that the exposures-specprod.fits file is not written out in a self-consistent manner. This shouldn't happen at all.

sbailey commented 2 years ago

@weaverba137 thanks for the checks.

@sbailey, I'm not entirely sure what you're asking me here. Do you actually want me to modify the exposures-fuji.fits file FRAMES HDU? No, I'm just asking for consistency checks, not modifying anything.

It seems like fixing the GOALTIME between EXPOSURES and FRAMES is the primary inconsistency to fix.

EFFTIMES: I'm curious about the minor inconsistencies, but not concerned about it on a pragmatic level.

EBV, EXPTIME: as long as they aren't 0 in one HDU and non-zero in another, that's good enough for now.

weaverba137 commented 2 years ago

And now the summary report for guadalupe:

The FITS and CSV versions of both exposures and tiles are self-consistent.
- Exception: tiles-guadalupe.(fits|csv), exposures-guadalupe.(fits|csv) and TILEID = 1007 has EFFTIME_ETC = NaN. In the tiles-guadalupe.csv file the NaN is encoded as --, but in the exposures-guadalupe.csv file it is encoded as nan.
- This appears to ultimately come from EXPID = 88872.
- This doesn't seem right for a main survey tile.
All other consistency checks pass for guadalupe.

sbailey commented 2 years ago

The EFFTIME_ETC traces from /global/cfs/cdirs/desi/spectro/data/20210516/00088872/etc-00088872.json which has etcdata['expinfo']['efftime'] = nan. I'll catch this case in tsnr_afterburner and treat it the same as missing data (EFFTIME_ETC=0.0 instead).

weaverba137 commented 2 years ago

As of the latest regeneration of the top-level tiles and exposures files for fuji and guadalupe, there are no remaining inconsistencies. In particular, the GOALTIME discrepancy noted in the previous round is completely gone.

desihub / desispec

Fuji: Inconsistencies between top-level exposures and tiles files #1649