desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
34 stars 24 forks source link

Exposures missing in exposures-daily.ecsv #2099

Open akremin opened 1 year ago

akremin commented 1 year ago

Anand and Eddie found that there are exposures missing in the exposures-daily.csv that do have raw data available at NERSC: https://github.com/desihub/desisurveyops/issues/132

Anand found this initially by seeing discrepancies in the tiles information, which is derived from the individual exposures (some of which are missing in this case). Reproducing Anand's code and output below:

   t = Table.read("/global/cfs/cdirs/desi/survey/ops/surveyops/trunk/ops/tiles-main.ecsv")
tileids = t["TILEID"][t["IN_DESI"]]

a = Table.read("/global/cfs/cdirs/desi/survey/ops/surveyops/trunk/ops/exposures.ecsv")
sel = (np.in1d(a["TILEID"], tileids)) & (a["QUALITY"] == "good")
a = a[sel]

b = Table.read("/global/cfs/cdirs/desi/spectro/redux/daily/exposures-daily.csv")
sel = ~np.in1d(a["EXPID"], b["EXPID"])
a[sel].pprint_all()

=>

 NIGHT   TILEID EXPID  OBSTYPE PROGRAM  EXPTIME  EFFTIME_ETC EFFTIME_SPEC EFFTIME  GOALTIME QUALITY COMMENTS
-------- ------ ------ ------- ------- --------- ----------- ------------ -------- -------- ------- --------
20211220  23046 114876 SCIENCE  BRIGHT  1455.613     105.105       -1.000  105.105     -1.0    good       --
20220110  24410 117827 SCIENCE  BRIGHT   916.408      17.409       -1.000   17.409     -1.0    good       --
20220110  24410 117828 SCIENCE  BRIGHT  1228.330       7.778       -1.000    7.778     -1.0    good       --
20220110  24410 117829 SCIENCE  BRIGHT   826.652       0.469       -1.000    0.469     -1.0    good       --
20220216  24459 122527 SCIENCE  BRIGHT  1560.731      55.470       -1.000   55.470     -1.0    good       --
20220216  24459 122528 SCIENCE  BRIGHT  1330.230      66.059       -1.000   66.059     -1.0    good       --
20220216  24459 122529 SCIENCE  BRIGHT   909.493      28.064       -1.000   28.064     -1.0    good       --
20220216  24459 122530 SCIENCE  BRIGHT    42.099       1.116       -1.000    1.116     -1.0    good       --
20220216  42669 122531 SCIENCE  BACKUP   602.955      25.048       -1.000   25.048     -1.0    good       --
20220216  42660 122532 SCIENCE  BACKUP   601.416      22.097       -1.000   22.097     -1.0    good       --
20220216  42666 122533 SCIENCE  BACKUP   602.993      16.234       -1.000   16.234     -1.0    good       --
20220216  42665 122534 SCIENCE  BACKUP   601.498      15.680       -1.000   15.680     -1.0    good       --
20220216  41654 122535 SCIENCE  BACKUP   116.539       0.014       -1.000    0.014     -1.0    good       --
20220221  25880 123283 SCIENCE  BRIGHT   792.406     181.242       -1.000  181.242     -1.0    good       --
20220415  42416 130372 SCIENCE  BACKUP   147.499       0.065       -1.000    0.065     -1.0    good       --
20220911  41334 141794 SCIENCE  BACKUP   600.729       4.951       -1.000    4.951     -1.0    good       --
20220911  40835 141795 SCIENCE  BACKUP   137.773       0.000       -1.000    0.000     -1.0    good       --
20220911  40065 141865 SCIENCE  BACKUP   601.838      17.400       -1.000   17.400     -1.0    good       --
20220911  41419 141866 SCIENCE  BACKUP   602.791      27.518       -1.000   27.518     -1.0    good       --
20220911  41421 141867 SCIENCE  BACKUP   602.802      34.934       -1.000   34.934     -1.0    good       --
20220911  42809 141868 SCIENCE  BACKUP   608.612      60.675       -1.000   60.675     -1.0    good       --
20220911  41461 141869 SCIENCE  BACKUP   607.988      29.690       -1.000   29.690     -1.0    good       --
20220911  40895 141870 SCIENCE  BACKUP   607.867      30.101       -1.000   30.101     -1.0    good       --
20220911  40935 141871 SCIENCE  BACKUP   606.913      52.994       -1.000   52.994     -1.0    good       --
20220911  40896 141872 SCIENCE  BACKUP   445.460      61.653       -1.000   61.653     -1.0    good       --
20220911  42873 141873 SCIENCE  BACKUP   156.037      14.020       -1.000   14.020     -1.0    good       --
20230416   7741 176606 SCIENCE    DARK  1685.884    1005.371       -1.000 1005.371     -1.0    good       --
20230503  41894 178901 SCIENCE  BACKUP   601.914       1.044       -1.000    1.044     -1.0    good       --
20230525  23891 182165 SCIENCE  BRIGHT   487.629     189.186       -1.000  189.186     -1.0    good       --
20230525   4534 182168 SCIENCE    DARK  1170.443     418.688       -1.000  418.688     -1.0    good       --
20230525   4534 182169 SCIENCE    DARK  1501.314     587.363       -1.000  587.363     -1.0    good       --
20230525   3472 182170 SCIENCE    DARK  1522.507    1002.821       -1.000 1002.821     -1.0    good       --
20230525  24723 182172 SCIENCE  BRIGHT   501.315     184.501       -1.000  184.501     -1.0    good       --
20230608  22308 184545 SCIENCE  BRIGHT   435.053     186.933       -1.000  186.933     -1.0    good       --

The first step is identifying why these weren't updated and if there are any holes in our workflow. My guess is it was human error, which is inevitable from time to time. If that is the case, we need to improve QA internally in the data-ops workflow to catch these things. It may not be possible to catch them the day of, but this could perhaps be part of a weekly or monthly QA step to monitor both consistency and stability of the calibrations, data reductions, and data products.

akremin commented 12 months ago

Exposures involved

As Eddie already identified, Tile=7741, Exp=176606, Night=20230416 was already identified in another ticket as being due to a delayed data transfer leading to the exposure not being known to the offline pipeline and not being processed.

The remaining 11 exposures are known by the pipeline's exposure tables:

idx EXPID   OBSTYPE TILEID  LASTSTEP    CAMWORD BADCAMWORD  BADAMPS EXPTIME EFFTIME_ETC SURVEY  FA_SURV FAPRGRM GOALTIME    GOALTYPE    EBVFAC  AIRMASS SPEED   TARGTRA TARGTDEC    SEQNUM  SEQTOT  PROGRAM PURPOSE MJD-OBS NIGHT   HEADERERR   EXPFLAG COMMENTS
0   123283  science 25880   all a0123456789         792.4062    181.242065  main    main    bright  180.0   bright  1.02560774703599    1.004302    0.24240205392573486 200.938229  30.79669    1   1   bright  main survey 59632.428224202 20220221    []  []  []
1   141868  science 42809   ignore  a012345679          608.6122    60.675076   main    main    backup  60.0    backup  1.2283257396197 1.214424    0.2113220805187511  55.644179   -2.05139    1   1   backup  main survey 59834.469655356 20220911    []  []  ['no good cals']
2   141871  science 40935   ignore  a012345679          606.9134    52.99398    main    main    backup  60.0    backup  1.59877392682857    1.17955 0.2979734342876573  61.618571   0.09631 1   1   backup  main survey 59834.49446982  20220911    []  []  ['no good cals']
3   141872  science 40896   ignore  a012345679          445.4596    61.652504   main    main    backup  60.0    backup  1.32649460909012    1.163006    0.31719168769558653 65.392708   1.52971 1   1   backup  main survey 59834.50270437  20220911    []  []  ['no good cals']
4   181917  science 4534    skysub  a0123456789         71.233  11.825192   main    main    dark    1000.0  dark    1.24651896734136    1.101391    0.3054375405242545  256.291071  15.63754    1   1   dark    main survey 60088.407451638 20230523    []  ['low_sn']  ['efftime=11.8s lt 100.0']
5   182165  science 23891   all a0123456789         487.6294    189.185608  main    main    bright  180.0   bright  1.04432715034984    1.034581    0.44906501584752584 255.387271  45.63247    1   1   bright  main survey 60090.363566204 20230525    []  []  []
6   182168  science 4534    all a0123456789         1170.4427   418.687866  main    main    dark    1000.0  dark    1.24651896734136    1.089807    0.6461000696772505  256.291508  15.63764    1   1   dark    main survey 60090.396377098 20230525    []  []  []
7   182169  science 4534    all a0123456789         1501.3141   587.363342  main    main    dark    1000.0  dark    1.24651896734136    1.12237 0.7439969317283479  256.291508  15.63764    1   1   dark    main survey 60090.410766255 20230525    []  []  []
8   182170  science 3472    all a0123456789         1522.5066   1002.821228 main    main    dark    1000.0  dark    1.15808950100315    1.06173 0.9810095586021308  265.958021  24.76234    1   1   dark    main survey 60090.429590809 20230525    []  []  []
9   182172  science 24723   all a0123456789         501.3155    184.501373  main    main    bright  180.0   bright  1.08610066455381    1.258314    0.6490220153290885  271.795871  65.37445    1   1   bright  main survey 60090.464734965 20230525    []  []  []
10  184545  science 22308   all a0123456789         435.0534    186.932999  main    main    bright  180.0   bright  1.04320917727886    1.167076    0.6127872157184105  256.041042  57.85755    1   1   bright  main survey 60104.376067936 20230608    []  []  []

Exposure assessments

Exposures 141868, 141871, and 141872 are meant to be ignored (i.e. they're marked bad), but nevertheless should have entries added to the exposures-daily.csv to indicate that. This is likely human error during a cleanup when marking these exposures as bad.

In looking at the exposures processed on disk (based on the existence of an exposure directory), I further find exposures: 123283, 181917, 182168, and 182169. Again these should have entries in the exposure-daily table, indicating likely human error.

Finally, exposures 182165, 182170, 182172, and 184545 were known by the pipeline but not processed even though they should have been.

The 20230525 exposures appear to be human error during a cleanup.

Exposure 184545 on 20230608 seems to be a genuine bug. Perlmutter was taken offline that day, so I think this was due to a known issue where exposures could be saved but not processed in the tilenight framework. That should be mitigated in the pipeline refactor in the next month or so. It is also worth noting that this was reported as an error in the tsnr afterburner that the exposure should have been processed but wasn't. So we can be better about checking for that information.

akremin commented 4 months ago

The output of the above code today gives three new backup tiles that aren't properly accounted for in daily:

 NIGHT   TILEID EXPID  OBSTYPE PROGRAM  EXPTIME  EFFTIME_ETC EFFTIME_SPEC EFFTIME GOALTIME QUALITY COMMENTS
-------- ------ ------ ------- ------- --------- ----------- ------------ ------- -------- ------- --------
20240127  41285 221977 SCIENCE  BACKUP   603.639       0.950       -1.000   0.950     -1.0    good       --
20240220  42548 226379 SCIENCE  BACKUP   604.980      52.871       -1.000  52.871     -1.0    good       --
20240322  42178 231678 SCIENCE  BACKUP   606.807       8.889       -1.000   8.889     -1.0    good       --

They are all present in the exposure_tables and will therefore be handled correctly in Jura. Because cases continue to occur I will leave the ticket open until we come up with a weekly/monthly script+routine to check for these oversights and remedy them.

Since there are no issues for Jura, however, I will unlink it from the Jura dashboard.