add DR6-v2 object and truth match & coadds images

heather999 commented 3 years ago

We have made a subsequent release of DR6-v2, which includes updated processing of one patch. This updated data should be made available through the data portal. Additionally, we will add coadd data for tracts 3828 and 3829.

To Do List

[x] Copy Object & Truth Match and coadd data into the appropriate area for the data portal - Heather
[x] Release new GCRCatalogs adding in desc_dc2_run2.2idr6*_v2 catalogs for object & truth - Joanne
[x] Update Portal documentation, noting the minimal GCRCatalogs release required, indicating that v1 is deprecated, and mention coadds (?) - Joanne
[x] Update DC2 Data Release Note - Yao
[x] Copy data to ANL shared endpoint - Katrin
[x] Update portal yaml to include DR6-v2, coadds and add deprecated field - Heather
[x] Update transfer page to deprecate DR6-v1 - Yao
[x] Deploy new portal to dev server - Heather
[x] Release new version to the production server - Heather
[x] Announce on #desc-announce and Community

JoanneBogart commented 3 years ago

@yymao It looks like all the new catalogs we need for v2 are already in GCRCatalogs (in particular, there doesn't need to be a dc2_object_run2.2i_dr6_v2_with_metacal because everyone should be using _with_addons instead). So what remains to be done is tagging and everything that follows from that. Is that right? And if so, is exactly what that entails written down somewhere?

yymao commented 3 years ago

Previously we decided the public version will be called desc_dc2_run2.2i_dr6* (and for object catalog, it actually points to a different set of files -- the ones that are post-GCR-translation). So we do need to add those configs.

JoanneBogart commented 3 years ago

Thanks for the clarification, @yymao

yymao commented 3 years ago

Other things to be included in this update:

[x] object number counts (# of galaxies, SNe, stars in truth table; # of extended/point sources in object table)
[ ] a statement about notifying DESC for non-personal use (e.g., ingesting the data set into other services or serving at another site)

JoanneBogart commented 3 years ago

For the second of @yymao bullets above, could that go on the acknowledgements page? Just add something about non-personal use in addition to citing in publications?

yymao commented 3 years ago

Agreed. The number counts should go into the release note. The statement can go to both the release note and the acknowledgements page on the portal.

yymao commented 3 years ago

I'm starting to work on the release note update. Question: should we mention the coadd image that we are now providing too?

katrinheitmann commented 3 years ago

If it's not too much work, it would be nice I think.

heather999 commented 3 years ago

Started work to update the datasets.yaml file on the issues/75/DR6-v2 branch. Now deployed to the dev portal: http://lb.desc-web-dev.production.svc.spin.nersc.org/
Just note that the coadd area is still in progress. I'll put the tar ball in that directory once it's fully assembled.

heather999 commented 3 years ago

The tarballs (small and full) are available and visible. A couple of questions

The tarballs when extracted currently reproduce the expected full directory structure from lsstdesc-public/dc2/run2.2..., but the globus transfer can do that as well. We don't need both. I could modify the tarballs to use a flatter structure and assume the globus transfer will create the lsstdesc-public/dc2/run2.2i-dr6-v2 area for the users. Then it will be up to them to extract that tarball in that same directory where the tarballs was initially stored. But I'm thinking to leave the tarballs as is so they will extract into the full directory structure by default. Sound ok?
We have both coadd-t3828-t3829-small.tar.gz coadd-t3828-t3829.tar.gz in the same directory on the portal, where -small is the example subset of 1 patch. Is that going to be confusing in the case where a user downloads the full directory and receives both tarballs? The tarballs actually extract into separate directories, coadd-t3828-t3829 and coadd-t3828-t3829-small. Alternatively, the small and full coadd data could be treated as completely separate datasets in the portal - though that might also be confusing. Ultimately, this will be explained in the README - so maybe that will help. Ideas welcome!

yymao commented 3 years ago

Is the tarball file much smaller? Globus supports transferring a whole directory so I wonder why tarballing is needed?

heather999 commented 3 years ago

It's more a question of how to interact with the more complex directory structure in the portal. There is some slight savings of a few GBs by tar.gz. The tarball offers the ability to just choose the file and download it. So one gets everything or just that small 1 patch sample.
There is a desire to retain the general structure that the Gen2 LSST Sci Pipelines creates - as the requesting groups seem to have knowledge of the LSST Sci Pipelines. This means we have the separate deepCoadd and deepCoadd-results subdirectories with separate directories for each band/tract. What would it mean for a user to see the deepCoadd and deepCoadd-results directories and be given the option to grab one or the other or both? Or would we offer views deeper than that so people pick and choose certain bands or one tract or even some set of patches? I don't think we want to make this too complicated for these requests for access to coadd data.

JoanneBogart commented 3 years ago

I'm in favor of keeping the tarballs. The two options users would have - something very small or the whole thing - seem reasonable and easy to document. If we open it up so they can pick and choose some of them will get tangled up.

heather999 commented 3 years ago

Thinking about Yao's words - I'm realizing he probably meant we could force users to choose either the full coadd directory or the small coadd directory and maybe it's possible to prevent them from drilling down further into the directory structure. I'd have to think about that to see if the web interface can be made to prevent drilling down into the coadd-t3828-t3829 folder and then the user just gets the whole directory structure.

yymao commented 3 years ago

Yes, @heather999, that's what I was thinking (allowing people to only choose either the full coadd directory or the small coadd directory, but not individual file within). Users will need to untar eventually and if using Globus I don't think a few GBs make a difference -- the time to untar will probably take longer.

yymao commented 3 years ago

Two updates:

GCRCatalogs v1.3.3 has been created and tagged by Joanne, and has been published on conda-forge
The updates DC2 Data Release Note is now in WG review

JoanneBogart commented 3 years ago

I've made changes to documentation (branch issues/75/portal-doc) for the new GCRCatalogs version and acknowledgements, but nothing yet concerning the new image datasets. This will entail changes, mostly small, to several files.

yymao commented 3 years ago

Thanks @JoanneBogart! If you'd like me to start to do an early review, can you make it a draft PR? But I can also wait until you are ready, of course!

heather999 commented 3 years ago

Ok - I've updated the dev portal to avoid browsing the coadd data and removed the tarballs. This will allow users to transfer either the full set of coadds for the 2 tracts (202 GB) or just grab that small sample (4.8 GB)

Some doc for the coadds likely needs to be added, and the (doc) linked updated on the transfer page
Nothing has been done to deal with the new deprecated field for the DR6-v1 datasets

If we're generally happy with this arrangement of the coadd data, it might be time to ask Katrin to copy these 2 directories over to ANL.

yymao commented 3 years ago

@heather999 thanks for the update! This arrangement sounds good to me (which is obvious since I proposed it...)

I can make some UI/UX improvement regarding the deprecated catalogs, and also maybe make coadded images "clickable" but when clicked, prompt an instruction?

Should I work on the branch you are using?

JoanneBogart commented 3 years ago

@yymao I was expecting you to review after I had done the rest - I just thought it would be more efficient for you - but either way is fine. I can make a draft PR. Since Heather now has all data in place I will try to finish up the doc tomorrow.

JoanneBogart commented 3 years ago

@heather999 this is what I gather from remarks above:

DC2 coadd dr6 v2 is all images (of the types @yymao described in the release note) from tracts 3828 & 3829
small version is a single patch from each

Is that right?

heather999 commented 3 years ago

Sorry been a crazy day.. going in order.. @yymao Yes, please go ahead with your plans and use the branch I've been working off of issue/75/DR6-v2 @JoanneBogart Right, and that single patch is 2,2 from tracts 3828 and 3829

heather999 commented 3 years ago

@katrinheitmann if you or someone you can designate has time - we have that coadd data available for copying to the ANL endpoint in /global/cfs/cdirs/lsst/gsharing/lsstdesc-public/dc2/run2.2i-dr6-v2/coadd-t3828-t3829 and /global/cfs/cdirs/lsst/gsharing/lsstdesc-public/dc2/run2.2i-dr6-v2/coadd-t3828-t3829-small

yymao commented 3 years ago

Thanks @heather999 -- I'll work on issue/75/DR6-v2 tomorrow.

yymao commented 3 years ago

Working the dataset display -- Question: is there any reason we don't want people to browse the coadded image data sets? Now that we have the table in Appendix C, I can imagine some people may want to take a look at the data structure?

heather999 commented 3 years ago

Done!

LSSTDESC / desc-data-portal

add DR6-v2 object and truth match & coadds images #75