Closed weaverba137 closed 4 months ago
@sbailey, please take a look at desispec.tilecompleteness.number_of_good_redrock
. I don't think that function ever would have worked, in part due to an import error, but also it is not used by anything at all in desispec.
sorry I m late here; I don t have time right now to dive in the details of the PR and the related issues, neither to dig into the data. if my comment is relevant, apologies in advance, just discard it!
my remark, with just reading the PR comment:
offline_matched_coadd_ccds_SV3-thru_{NIGHT}.fits
file: those files contain exposures starting from SV3:
>>> d = Table.read("/global/cfs/cdirs/desi/survey/GFA/offline_matched_coadd_ccds_SV3-thru_20240409.fits", 3)
>>> d["NIGHT"].min()
20210405
offline_matched_coadd_ccds_{SV1,SV2}-thru_{NIGHT}.fits
files.for the daily operations, it shouldn t be a problem, as earlier (including SV1, SV2) GFA info already is in the exposures-daily.csv
file, so the code will just look for GFA info for the latest exposures, which will be in the offline_matched_coadd_ccds_SV3-thru_{NIGHT}.fits
.
but for a production, e.g. jura
, which will reprocess all exposures since SV1, it sounds to me that the code won t find the relevant GFA info files for pre-SV3 exposures.
I see those two "latest" files:
ls -l /global/cfs/cdirs/desi/survey/GFA_orig/offline_matched_coadd_ccds_SV1-thru_* | tail -n 1
-r--r--r-- 1 ameisner desi 53864640 Sep 29 2021 /global/cfs/cdirs/desi/survey/GFA_orig/offline_matched_coadd_ccds_SV1-thru_20210928.fits
ls -l /global/cfs/cdirs/desi/survey/GFA_orig/offline_matched_coadd_ccds_SV2-thru_* | tail -n 1
-r--r--r-- 1 ameisner desi 29888640 Sep 29 2021 /global/cfs/cdirs/desi/survey/GFA_orig/offline_matched_coadd_ccds_SV2-thru_20210928.fits
looks like the SV1 file contains all exposures from the SV2 file, ie we may just need /global/cfs/cdirs/desi/survey/GFA_orig/offline_matched_coadd_ccds_SV1-thru_20210928.fits
for pre-SV3 exposures.
as this exposure list static, I guess it s fine to hard-code this path (unless we do at some point a whole reprocessing of the GFA data).
Do we know specifically how this was done for fuji
, guadalupe
and iron
? In other words, did these read from /global/cfs/cdirs/desi/survey/GFA_orig
or /global/cfs/cdirs/desi/survey/GFA
?
Iron used /global/cfs/cdirs/desi/survey/GFA
on Feb 8 2023. That path is currently a symlink to GFA.NERSC
, created on Feb 16 2024, i.e. created a year after Iron was run. GFA_orig
is a symlink, also created on Feb 16 2024, to ../users/ameisner/GFA/conditions
. I was expecting GFA_orig
to point to what we used for Iron, but that ameisner/GFA/conditions
directory only goes though offline_matched_coadd_ccds_SV3-thru_20211013.fits
, which is different from what we cached in /global/cfs/cdirs/desi/public/dr1/survey/GFA/offline_matched_coadd_ccds_SV3-thru_20220613.fits
.
i.e. I'm not sure what we actually used for iron and prior.
fuji
and iron
, the first night, exposure is 20201214, 67710
, from the exposures-(fuji|iron).csv
file.fuji
, we defined the one GFA summary file: /global/cfs/cdirs/desi/public/edr/survey/GFA/offline_matched_coadd_ccds_SV3-thru_20210927.fits
. This file identical to /global/cfs/cdirs/desi/survey/GFA_orig/offline_matched_coadd_ccds_SV3-thru_20210927.fits
.iron
, we defined the one GFA summary file:
/global/cfs/cdirs/desi/public/dr1/survey/GFA/offline_matched_coadd_ccds_SV3-thru_20220613.fits
. There is no equivalent file in /global/cfs/cdirs/desi/survey/GFA_orig
. This file was taken out of /global/cfs/cdirs/desi/survey/GFA.KPNO
.========= ================================================= ===== ====== =================
Location Filename First Last Comment
========= ================================================= ===== ====== =================
GFA_orig offline_matched_coadd_ccds_SV1-thru_20210928.fits 67678 102108 Last SV1 file
GFA_orig offline_matched_coadd_ccds_SV2-thru_20210928.fits 81831 102108 Last SV2 file
EDR offline_matched_coadd_ccds_SV3-thru_20210927.fits 83522 101996 Official EDR file
DR1 offline_matched_coadd_ccds_SV3-thru_20220613.fits 83522 139653 Official DR1 file
GFA.NERSC offline_matched_coadd_ccds_SV3-thru_20240409.fits 83522 235204 Most recent file
========= ================================================= ===== ====== =================
Now for some set theory:
offline_matched_coadd_ccds_SV2-thru_20210928.fits
appears to be a subset of offline_matched_coadd_ccds_SV1-thru_20210928.fits
, so it can be ignored in principle. This is just looking at the Exposure IDs, not other columns.offline_matched_coadd_ccds_SV3-thru_20210927.fits
, the offical EDR file, appears to be a subset of offline_matched_coadd_ccds_SV1-thru_20210928.fits
. This is just looking at the Exposure IDs, not other columns.offline_matched_coadd_ccds_SV3-thru_20210927.fits
, the offical EDR file, appears to be a subset of offline_matched_coadd_ccds_SV3-thru_20220613.fits
, the official DR1 file.offline_matched_coadd_ccds_SV3-thru_20220613.fits
, the official DR1 file, appears to be a subset of offline_matched_coadd_ccds_SV3-thru_20240409.fits
, the most recent file.So EDR < DR1 < Most recent, in the set theory sense. But some exposures from the SV1 file are missing. The very slight difference in Last Expid between SV1 & SV3 may simply be a matter of the one night difference 20210928 versus 20210927.
Before we consider anything else, we may want to add offline_matched_coadd_ccds_SV1-thru_20210928.fits
or offline_matched_coadd_ccds_SV1-thru_20210927.fits
to EDR and DR1 (i.e. align the last exposures), so that we have GFA files that cover all exposures in those releases.
A possible way to rewrite the function read_gfa_data()
would be:
Thanks @weaverba137 for digging into the files and the set theory. My understanding of the situation now:
offline_matched_coadd_ccds_SV1-thru_20210928.fits
in EDR, draft DR1, and current /global/cfs/cdirs/desi/survey/GFA
appears to be an oversight. At minimum we should add it to EDR and DR1./global/cfs/cdirs/desi/survey/GFA
, the current main code would work because it grabs the latest of each of offline_matched_coadd_ccds_{SV1,SV2,SV3}-thru_????????.fits
and concatenates them together (albeit resulting in duplicate rows due to date overlaps in the files). In some sense, the problem isn't in the code but in our misunderstandings about what files needed to be preserved when transitioning from KPNO -> NERSC processing; we should not have dropped offline_matched_coadd_ccds_SV1-thru_20210928.fits
.offline_matched_coadd_ccds_SV1-thru_20210928.fits
is the minimum needed, though we may want/need to cleanup how all of this works anyway.Options:
offline_matched_coadd_ccds_SV1-thru_20210928.fits
immediately, then in this PR revert read_gfa_data()
to the "stack the latest SV1,SV2,SV3-thru_????????.fits file" method, while keeping the additional unit tests of this PR (updated to match the logic if needed).Re-add offline_matched_coadd_ccds_SV1-thru_20210928.fits immediately, then in this PR revert read_gfadata() to the "stack the latest SV1,SV2,SV3-thru????????.fits file" method, while keeping the additional unit tests of this PR (updated to match the logic if needed).
I think my proposal above is a simplified version of that. Would that be acceptable?
Yes, I think that would be fine, with the inclusion of actually adding the SV1 file back to the GFA directory as well. This PR already has the code for sorting by night regardless of what future SURVEY prefixes may exist, so the only change is to also read the latest SV1 file as a special case and concatenate the two. Good.
OK, I'll probably have to update that on Monday at this point, and I'll take care of adding in the SV1 file.
I can also add some of your explanations above to the documentation.
For the SV1 file: offline_matched_coadd_ccds_SV1-thru_20210405.fits
would have the minimal overlap with any current file. The last EXPID is 83543, and the first expid in SV3 and later is 83522. However, given the intermingling of SV1 & SV3, perhaps we really need to take the last SV1 file, and just accept the overlap. Thoughts?
Based on previous versions of this function, which would have been used for fuji
and iron
, the file offline_matched_coadd_ccds_SV1-thru_20210928.fits
would have been selected, so we should retain that same file, even if there is overlap with later files.
offline_matched_coadd_ccds_SV1-thru_20210928.fits
is now visible in EDR, DR1 and the GFA directory for daily processing.
I think this is ready for testing and review now.
@sbailey, just a reminder to test this once we start getting new GFA data.
Summarizing a chat with @weaverba137
To resolve this, I have made a new branch "jura" which we can use for any last minute Jura updates + tags, thus opening up "main" for daily ops again, including this PR. Until we are completely done with Jura, we will need to carefully consider each PR for whether it should be to main or jura. The basic workflow rule is that we can merge PR -> jura -> main, or PR -> main, but not PR -> main -> jura because that could bring in other changes on main that we don't want in jura tags.
This PR closes #2250.
read_gfa_data()
returns the GFA summary file with the most recent date, independently of any label (SV1
,SV2
,SV3
,main
,foobar
, etc.).read_gfa_data()
is moved intodesispec.tilecompleteness
, which makes it easier to write a unit test.