Open sbailey opened 2 years ago
I note for the record that currently there are no symlinks in daily/tiles/archive
. What is the level of readiness for addressing #1644 versus this issue? Specifically, guadalupe
checksums are being created in the immediate future (~days) and therefore:
guadalupe
anyway.guadalupe
.This also suggests that the layout of the tiles/archive/TILEID/ARCHIVEDATE
directory is the same as or very similar to the layout of a SPECPROD/tiles/cumulative/TILEID/LASTNIGHT
directory. Is this a reasonably safe assumption?
By "reasonably safe": e.g. there could be differences in the number or types of files in certain cases, but there will not be differences in subdirectories. In this case, there will be a logs/
subdirectory but there shouldn't be any other subdirectories.
No one is actively working on #1644 (cross prod archiving), so it can wait for guadalupe checksumming.
We do not want to add checksum files to nights that will be symlinked to guadalupe anyway.
Clarifying: cross production archiving will symlink daily/tiles/archive/TILEID/ARCHIVEDATE to a guadalupe/tiles/cumulative/TILEID/LASTNIGHT directory; we will not be creating new guadalupe/tiles/archive/ directories. i.e. the archiving process is a way of freezing a cumulative/TILEID/LASTNIGHT directory, either by moving it to an archive directory (daily) or otherwise linking to a guaranteed frozen copy (e.g. guadalupe). i.e. I think you can proceed with guadalupe checksums, or otherwise I am misunderstanding the concern.
This also suggests that the layout of the tiles/archive/TILEID/ARCHIVEDATE directory is the same as or very similar to the layout of a SPECPROD/tiles/cumulative/TILEID/LASTNIGHT directory. Is this a reasonably safe assumption?
Yes, they are identical in structure. In the normal archiving case, the tiles/archive/TILEID/ARCHIVEDATE is a moved copy of files that were originally in tiles/cumulative/TILEID/LASTNIGHT, and a there is a symlink left behind in tiles/cumulative/TILEID/LASTNIGHT to the new archived location. In the case of cross production archiving, it will link directly to a tiles/cumulative/TILEID/LASTNIGHT directory. So they are by construction the same structure.
@sbailey, Indeed, it's not a concern in regards to creating guadalupe
checksums. To expand on the process a bit:
guadalupe
.Clarifying item 3:
- That other process should skip ARCHIVEDATE directories that will be replaced by symlinks into guadalupe.
When we re-archive a tile linking to guadalupe, that would get a new ARCHIVEDATE so that we don't break the previous archived version that we promised not to change. i.e. we will not replace existing ARCHIVEDATEs with a link to guadalupe instead. They are archived, frozen, and never supposed to change (except getting their checksums added).
Note that ARCHIVEDATE is the date that we decided to promote a particular processing to archival status for MTL decisions; it is not the same as LASTNIGHT (the last night of data included in that particular cumulative coadd).
Ah, OK. In that case the script to create checksums for pre-existing ARCHIVEDATE should just do so for all of them. Much simpler.
When
desi_archive_tilenight
creates each tiles/archive/TILEID/ARCHIVEDATE directory, it should also create checksums for that directory.@weaverba137 please specify how checksums are created for productions so that we use a consistent method (checksum algorithm, filename, ...)
Related is #1644 about cross production tile archiving. Nominally this form of archiving would create a link daily/tiles/archive/TILEID/ARCHIVEDATE -> ../../../../guadalupe/tiles/cumulative/TILEID/LASTNIGHT . Ideally the guadalupe production would already have a checksum file in tiles/cumulative/TILEID/LASTNIGHT matching the same form that we would have put into daily/tiles/archive/TILEID/ARCHIVEDATE if it wasn't a link. If productions like guadalupe have a different organization for where it would put the checksum, let's define that and discuss options.