Tagged version of fiberassign files for database

weaverba137 commented 2 years ago

When loading the redshift database (desispec.database.redshift), the fiberassign files are currently loaded from trunk. If trunk corresponds exactly to any tag that would have been made for edr/fuji or dr1, then that's fine. If not, we should identify which tag(s) to use, creating them if necessary. This issue was split out from #1819.

araichoor commented 2 years ago

thanks for raising that point.

actually, one "issue" with the fiberassign files is that some columns were - or still are - bugged. those are not "critical" columns from an operations point of view, but columns with photometry information, which are propagated downstream in the spectro. products (coadd, redrock, etc) fibermap extension.

we did a first round of patching with fixing some columns (on Oct. 5 2021), so prior to fuji / guadalupe were generated. but some columns still remain bugged, and we are likely to soon do a second round of patching (hopefully before iron).

could it make sense to do the following:

list all the tiles that appear in edr / fuji;
identify the date when edr / fuji was run (fuji_date);
create an edr folder in the tags/ folder, where we would copy, for the edr / fuji tiles only, the fiberassign files from an svn-checkout version at fuji_date.

that way, the column information in the fiberassign and in the spectro. products fibermap should be consistent (even if bugged).

and we could proceed similarly in the future for dr1 and later releases.

for sure, it would add some redundancy (as there would be a set of fiberassign files associated to each release), but it should be fine in term of disk space a typical fiberassign file is 5MB large:

edr / fuji has ~700 tiles => that would be <5 GB;
the main program has ~20k tiles overall => that would be ~100 GB.

weaverba137 commented 2 years ago

This sounds reasonable. I think you should go ahead with a test.

araichoor commented 2 years ago

thanks for the answer. I ll work on writing a script doing that.

for sanity/curiosity, I just checked: the currently ~~oldest~~ latest fuji fiberassign file in the svn directory dates from Oct. 5, 2021 (i.e. from the patching); so before the fuji launch (Jan. 24, 2022). so there is no need for this edr / fuji round to revert the svn to an older version, right?

svn_dir = "/global/cfs/cdirs/desi/target/fiberassign/tiles/trunk"
d = Table.read("/global/cfs/cdirs/desi/spectro/redux/fuji/tiles-fuji.fits")
tileids = np.unique(d["TILEID"])
timestamps = np.zeros(len(tileids), dtype=object)
for i in range(len(tileids)):
    tileid = tileids[i]
    tileidpad = "{:06d}".format(tileid)
    fn = os.path.join(svn_dir, tileidpad[:3], "fiberassign-{}.fits.gz".format(tileidpad))
    tmpstr = subprocess.Popen("ls -l --full-time {}".format(fn), stdout=subprocess.PIPE, shell=True).communicate()[0].strip().decode("utf-8")
    timestamps[i] = tmpstr.split()[5]

np.unique(timestamps)

returns:

array(['2020-12-18', '2020-12-19', '2021-01-01', '2021-01-02',
       '2021-01-05', '2021-01-11', '2021-01-29', '2021-02-02',
       '2021-02-04', '2021-02-23', '2021-03-04', '2021-03-16',
       '2021-04-30', '2021-05-05', '2021-05-11', '2021-05-12',
       '2021-05-13', '2021-10-05'], dtype=object)

araichoor commented 2 years ago

and a comment (based on the on-going email Bug in lsdr9-photometry files thread discussion):

in the current edr release plan, we have two versions of the fiberassign files:

raw data, unpatched: https://desidatamodel.readthedocs.io/en/latest/DESI_SPECTRO_DATA/NIGHT/EXPID/fiberassign-TILEID.html
svn checkout, patched: https://desidatamodel.readthedocs.io/en/latest/DESI_TARGET/fiberassign/tiles/TILES_VERSION/TILEXX/fiberassign-TILEID.html

I confirm that the fuji ztiles*fits files are based on the svn fiberassign files, i.e. the patched ones. @akremin confirmed that from the fuji code/logs point-of-view, and I did check the files. (there remain few inconsistencies, but that s another issue, not related to the patching).

those two fiberassign file versions have some columns differences (due to patching of some photometric columns). I m not sure if it s possible to not release the ones in the raw data directory, is it? if we have to release the two sets, then we may want to mention this discrepancy somewhere (I don t know what is the best place for that).

sbailey commented 2 years ago

I think the datamodel for the raw data version of the fiberassign files is that place to mention that those were the files that were actually used for the observations, but then reference the other set as what is used for spectro pipeline production and that those include patches to correct values needed for analysis but that don't impact the original observations.

araichoor commented 2 years ago

thanks for that suggestion.

another question, @sbailey : should the script I ll write to create such a fiberassign "tag" folder go in desispec? or elsewhere?

sbailey commented 2 years ago

@araichoor if this is a one-off script used just for making this tag, but not for general usage in making future tags, then let's put it in git fiberassign/etc/ for the record.

araichoor commented 2 years ago

actually I was thinking to make it general, for future releases. the two release-dependent arguments simply being:

the tiles-PROD.fits file (list of tiles)
the date when the prod start running (for the svn revision version)

sbailey commented 2 years ago

In that case I think it should go into git fiberassign/bin, or maybe still git fiberassign/etc where we sometimes put "scripts to use occasionally but not as a standard part of using this package"; desispec/etc has several of those. Either way, in the fiberassign repo not desispec.

araichoor commented 1 year ago

bringing back to this thread the discussion with @sbailey and @weaverba137 for better book-keeping:

in short: for both fuji and guadalupe, I suggest to use for the tag the revision 1120 from Jan. 23, 2022 (https://desi.lbl.gov/trac/changeset/1120/data/tiles/trunk). any comments are welcome!

in details: summarizing offline discussions:

we d just need to find the correct svn revision version (ie when the production was run; it assumes the fiberassign files in the production tiles didn t change during when the production ran);
create a tag from that;
the important thing is that a tag exists and that it contains all the tiles in fuji. Similarly for guadalupe. They could even be the same tag. It does not matter if it contains additional tiles.
we can separately decide whether to release the tag. For the database, the tag ensures long-term reproducibility.

so I wrote a small script to recover all the revisions for the tiles of a given production; I ll submit a PR in fiberassign for that.

for fuji and guadalupe:

observations: from Dec. 2020 to Jul. 2021
processing: Jan, 24, 2022 ([desi-data 5825]) to Apr. 19, 2022 (from file timestamps)
fiberassign svn commits for those tiles:
- up to Jul. 2021: "regular" commits along with tile designs
- Oct. 2021: first patching
- Oct. 2022: second patching
- Feb. 2023: tileid=80715

so we could pick for tagging any revision between Oct. 2021 and Oct. 2022, as the fuji and guadalupe fiberassign files were not changed during that time. I suggest to use the revision 1120 from Jan. 23, 2022, i.e. just before the processing was launched.

weaverba137 commented 1 year ago

@araichoor, this sounds fine to me. It's useful for database loading that the same tag will work for fuji and guadalupe.

Have we decided on a name for the tag?

sbailey commented 1 year ago

@araichoor sounds good. Thanks for double checking all of this.

Let's use tag 0.5 (or any 0.N tag if you have a favorite number). Iron used tag 1.1 (1.0 also exists, but was superseded by 1.1 which covers all of the tiles used by final Iron).

araichoor commented 1 year ago

Can I let you both decide the tag number and create it?

(btw, for correctness, I ve added a minor correction to my previous message, as TILEID=80715 fiberassign files were actually re-committed on Feb. 3, 2023 -- this is meaningless for that discussion).

weaverba137 commented 1 year ago

For the record, I will take on the task of actually creating the tag. It probably won't happen today, but hopefully by Monday.

weaverba137 commented 1 year ago

The final tag command was:

svn copy --revision 1120 -m "Tagging tiles/0.5 (fuji/guadalupe)." ${SVN_URL}/data/tiles/trunk ${SVN_URL}/data/tiles/tags/0.5

weaverba137 commented 1 year ago

I think all the needs of this issue are satisfied. Please reopen if we've forgotten something.

desihub / desispec

Tagged version of fiberassign files for database #1834