LSSTDESC / desc-data-portal

LSST DESC Data Portal web app and the associated documentation and example notebooks.
https://data.lsstdesc.org
BSD 3-Clause "New" or "Revised" License
7 stars 2 forks source link

add DR6-v2 object and truth match & coadds images #75

Closed heather999 closed 3 years ago

heather999 commented 3 years ago

We have made a subsequent release of DR6-v2, which includes updated processing of one patch. This updated data should be made available through the data portal. Additionally, we will add coadd data for tracts 3828 and 3829.

To Do List

JoanneBogart commented 3 years ago

@yymao It looks like all the new catalogs we need for v2 are already in GCRCatalogs (in particular, there doesn't need to be a dc2_object_run2.2i_dr6_v2_with_metacal because everyone should be using _with_addons instead). So what remains to be done is tagging and everything that follows from that. Is that right? And if so, is exactly what that entails written down somewhere?

yymao commented 3 years ago

Previously we decided the public version will be called desc_dc2_run2.2i_dr6* (and for object catalog, it actually points to a different set of files -- the ones that are post-GCR-translation). So we do need to add those configs.

JoanneBogart commented 3 years ago

Thanks for the clarification, @yymao

yymao commented 3 years ago

Other things to be included in this update:

JoanneBogart commented 3 years ago

For the second of @yymao bullets above, could that go on the acknowledgements page? Just add something about non-personal use in addition to citing in publications?

yymao commented 3 years ago

Agreed. The number counts should go into the release note. The statement can go to both the release note and the acknowledgements page on the portal.

yymao commented 3 years ago

I'm starting to work on the release note update. Question: should we mention the coadd image that we are now providing too?

katrinheitmann commented 3 years ago

If it's not too much work, it would be nice I think.

heather999 commented 3 years ago

Started work to update the datasets.yaml file on the issues/75/DR6-v2 branch. Now deployed to the dev portal: http://lb.desc-web-dev.production.svc.spin.nersc.org/
Just note that the coadd area is still in progress. I'll put the tar ball in that directory once it's fully assembled.

heather999 commented 3 years ago

The tarballs (small and full) are available and visible. A couple of questions

yymao commented 3 years ago

Is the tarball file much smaller? Globus supports transferring a whole directory so I wonder why tarballing is needed?

heather999 commented 3 years ago

It's more a question of how to interact with the more complex directory structure in the portal. There is some slight savings of a few GBs by tar.gz. The tarball offers the ability to just choose the file and download it. So one gets everything or just that small 1 patch sample.
There is a desire to retain the general structure that the Gen2 LSST Sci Pipelines creates - as the requesting groups seem to have knowledge of the LSST Sci Pipelines. This means we have the separate deepCoadd and deepCoadd-results subdirectories with separate directories for each band/tract. What would it mean for a user to see the deepCoadd and deepCoadd-results directories and be given the option to grab one or the other or both? Or would we offer views deeper than that so people pick and choose certain bands or one tract or even some set of patches? I don't think we want to make this too complicated for these requests for access to coadd data.

JoanneBogart commented 3 years ago

I'm in favor of keeping the tarballs. The two options users would have - something very small or the whole thing - seem reasonable and easy to document. If we open it up so they can pick and choose some of them will get tangled up.

heather999 commented 3 years ago

Thinking about Yao's words - I'm realizing he probably meant we could force users to choose either the full coadd directory or the small coadd directory and maybe it's possible to prevent them from drilling down further into the directory structure. I'd have to think about that to see if the web interface can be made to prevent drilling down into the coadd-t3828-t3829 folder and then the user just gets the whole directory structure.

yymao commented 3 years ago

Yes, @heather999, that's what I was thinking (allowing people to only choose either the full coadd directory or the small coadd directory, but not individual file within). Users will need to untar eventually and if using Globus I don't think a few GBs make a difference -- the time to untar will probably take longer.

yymao commented 3 years ago

Two updates:

JoanneBogart commented 3 years ago

I've made changes to documentation (branch issues/75/portal-doc) for the new GCRCatalogs version and acknowledgements, but nothing yet concerning the new image datasets. This will entail changes, mostly small, to several files.

yymao commented 3 years ago

Thanks @JoanneBogart! If you'd like me to start to do an early review, can you make it a draft PR? But I can also wait until you are ready, of course!

heather999 commented 3 years ago

Ok - I've updated the dev portal to avoid browsing the coadd data and removed the tarballs. This will allow users to transfer either the full set of coadds for the 2 tracts (202 GB) or just grab that small sample (4.8 GB)

If we're generally happy with this arrangement of the coadd data, it might be time to ask Katrin to copy these 2 directories over to ANL.

yymao commented 3 years ago

@heather999 thanks for the update! This arrangement sounds good to me (which is obvious since I proposed it...)

I can make some UI/UX improvement regarding the deprecated catalogs, and also maybe make coadded images "clickable" but when clicked, prompt an instruction?

Should I work on the branch you are using?

JoanneBogart commented 3 years ago

@yymao I was expecting you to review after I had done the rest - I just thought it would be more efficient for you - but either way is fine. I can make a draft PR. Since Heather now has all data in place I will try to finish up the doc tomorrow.

JoanneBogart commented 3 years ago

@heather999 this is what I gather from remarks above:

Is that right?

heather999 commented 3 years ago

Sorry been a crazy day.. going in order.. @yymao Yes, please go ahead with your plans and use the branch I've been working off of issue/75/DR6-v2 @JoanneBogart Right, and that single patch is 2,2 from tracts 3828 and 3829

heather999 commented 3 years ago

@katrinheitmann if you or someone you can designate has time - we have that coadd data available for copying to the ANL endpoint in /global/cfs/cdirs/lsst/gsharing/lsstdesc-public/dc2/run2.2i-dr6-v2/coadd-t3828-t3829 and /global/cfs/cdirs/lsst/gsharing/lsstdesc-public/dc2/run2.2i-dr6-v2/coadd-t3828-t3829-small

yymao commented 3 years ago

Thanks @heather999 -- I'll work on issue/75/DR6-v2 tomorrow.

yymao commented 3 years ago

Working the dataset display -- Question: is there any reason we don't want people to browse the coadded image data sets? Now that we have the table in Appendix C, I can imagine some people may want to take a look at the data structure?

heather999 commented 3 years ago

Done!