desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

Tagged version of fiberassign files for database #1834

Closed weaverba137 closed 1 year ago

weaverba137 commented 2 years ago

When loading the redshift database (desispec.database.redshift), the fiberassign files are currently loaded from trunk. If trunk corresponds exactly to any tag that would have been made for edr/fuji or dr1, then that's fine. If not, we should identify which tag(s) to use, creating them if necessary. This issue was split out from #1819.

araichoor commented 2 years ago

thanks for raising that point.

actually, one "issue" with the fiberassign files is that some columns were - or still are - bugged. those are not "critical" columns from an operations point of view, but columns with photometry information, which are propagated downstream in the spectro. products (coadd, redrock, etc) fibermap extension.

we did a first round of patching with fixing some columns (on Oct. 5 2021), so prior to fuji / guadalupe were generated. but some columns still remain bugged, and we are likely to soon do a second round of patching (hopefully before iron).

could it make sense to do the following:

that way, the column information in the fiberassign and in the spectro. products fibermap should be consistent (even if bugged).

and we could proceed similarly in the future for dr1 and later releases.

for sure, it would add some redundancy (as there would be a set of fiberassign files associated to each release), but it should be fine in term of disk space a typical fiberassign file is 5MB large:

weaverba137 commented 2 years ago

This sounds reasonable. I think you should go ahead with a test.

araichoor commented 2 years ago

thanks for the answer. I ll work on writing a script doing that.

for sanity/curiosity, I just checked: the currently oldest latest fuji fiberassign file in the svn directory dates from Oct. 5, 2021 (i.e. from the patching); so before the fuji launch (Jan. 24, 2022). so there is no need for this edr / fuji round to revert the svn to an older version, right?

svn_dir = "/global/cfs/cdirs/desi/target/fiberassign/tiles/trunk"
d = Table.read("/global/cfs/cdirs/desi/spectro/redux/fuji/tiles-fuji.fits")
tileids = np.unique(d["TILEID"])
timestamps = np.zeros(len(tileids), dtype=object)
for i in range(len(tileids)):
    tileid = tileids[i]
    tileidpad = "{:06d}".format(tileid)
    fn = os.path.join(svn_dir, tileidpad[:3], "fiberassign-{}.fits.gz".format(tileidpad))
    tmpstr = subprocess.Popen("ls -l --full-time {}".format(fn), stdout=subprocess.PIPE, shell=True).communicate()[0].strip().decode("utf-8")
    timestamps[i] = tmpstr.split()[5]

np.unique(timestamps)

returns:

array(['2020-12-18', '2020-12-19', '2021-01-01', '2021-01-02',
       '2021-01-05', '2021-01-11', '2021-01-29', '2021-02-02',
       '2021-02-04', '2021-02-23', '2021-03-04', '2021-03-16',
       '2021-04-30', '2021-05-05', '2021-05-11', '2021-05-12',
       '2021-05-13', '2021-10-05'], dtype=object)
araichoor commented 2 years ago

and a comment (based on the on-going email Bug in lsdr9-photometry files thread discussion):

in the current edr release plan, we have two versions of the fiberassign files:

I confirm that the fuji ztiles*fits files are based on the svn fiberassign files, i.e. the patched ones. @akremin confirmed that from the fuji code/logs point-of-view, and I did check the files. (there remain few inconsistencies, but that s another issue, not related to the patching).

those two fiberassign file versions have some columns differences (due to patching of some photometric columns). I m not sure if it s possible to not release the ones in the raw data directory, is it? if we have to release the two sets, then we may want to mention this discrepancy somewhere (I don t know what is the best place for that).

sbailey commented 2 years ago

I think the datamodel for the raw data version of the fiberassign files is that place to mention that those were the files that were actually used for the observations, but then reference the other set as what is used for spectro pipeline production and that those include patches to correct values needed for analysis but that don't impact the original observations.

araichoor commented 2 years ago

thanks for that suggestion.

another question, @sbailey : should the script I ll write to create such a fiberassign "tag" folder go in desispec? or elsewhere?

sbailey commented 2 years ago

@araichoor if this is a one-off script used just for making this tag, but not for general usage in making future tags, then let's put it in git fiberassign/etc/ for the record.

araichoor commented 2 years ago

actually I was thinking to make it general, for future releases. the two release-dependent arguments simply being:

sbailey commented 2 years ago

In that case I think it should go into git fiberassign/bin, or maybe still git fiberassign/etc where we sometimes put "scripts to use occasionally but not as a standard part of using this package"; desispec/etc has several of those. Either way, in the fiberassign repo not desispec.

araichoor commented 1 year ago

bringing back to this thread the discussion with @sbailey and @weaverba137 for better book-keeping:

in short: for both fuji and guadalupe, I suggest to use for the tag the revision 1120 from Jan. 23, 2022 (https://desi.lbl.gov/trac/changeset/1120/data/tiles/trunk). any comments are welcome!

in details: summarizing offline discussions:

so I wrote a small script to recover all the revisions for the tiles of a given production; I ll submit a PR in fiberassign for that.

for fuji and guadalupe:

so we could pick for tagging any revision between Oct. 2021 and Oct. 2022, as the fuji and guadalupe fiberassign files were not changed during that time. I suggest to use the revision 1120 from Jan. 23, 2022, i.e. just before the processing was launched.

weaverba137 commented 1 year ago

@araichoor, this sounds fine to me. It's useful for database loading that the same tag will work for fuji and guadalupe.

Have we decided on a name for the tag?

sbailey commented 1 year ago

@araichoor sounds good. Thanks for double checking all of this.

Let's use tag 0.5 (or any 0.N tag if you have a favorite number). Iron used tag 1.1 (1.0 also exists, but was superseded by 1.1 which covers all of the tiles used by final Iron).

araichoor commented 1 year ago

Can I let you both decide the tag number and create it?

(btw, for correctness, I ve added a minor correction to my previous message, as TILEID=80715 fiberassign files were actually re-committed on Feb. 3, 2023 -- this is meaningless for that discussion).

weaverba137 commented 1 year ago

For the record, I will take on the task of actually creating the tag. It probably won't happen today, but hopefully by Monday.

weaverba137 commented 1 year ago

The final tag command was:

svn copy --revision 1120 -m "Tagging tiles/0.5 (fuji/guadalupe)." ${SVN_URL}/data/tiles/trunk ${SVN_URL}/data/tiles/tags/0.5
weaverba137 commented 1 year ago

I think all the needs of this issue are satisfied. Please reopen if we've forgotten something.