Closed heather999 closed 3 years ago
@yymao It looks like all the new catalogs we need for v2 are already in GCRCatalogs (in particular, there doesn't need to be a dc2_object_run2.2i_dr6_v2_with_metacal because everyone should be using _with_addons instead). So what remains to be done is tagging and everything that follows from that. Is that right? And if so, is exactly what that entails written down somewhere?
Previously we decided the public version will be called desc_dc2_run2.2i_dr6*
(and for object catalog, it actually points to a different set of files -- the ones that are post-GCR-translation). So we do need to add those configs.
Thanks for the clarification, @yymao
Other things to be included in this update:
For the second of @yymao bullets above, could that go on the acknowledgements page? Just add something about non-personal use in addition to citing in publications?
Agreed. The number counts should go into the release note. The statement can go to both the release note and the acknowledgements page on the portal.
I'm starting to work on the release note update. Question: should we mention the coadd image that we are now providing too?
If it's not too much work, it would be nice I think.
Started work to update the datasets.yaml file on the issues/75/DR6-v2 branch. Now deployed to the dev portal: http://lb.desc-web-dev.production.svc.spin.nersc.org/
Just note that the coadd area is still in progress. I'll put the tar ball in that directory once it's fully assembled.
The tarballs (small and full) are available and visible. A couple of questions
lsstdesc-public/dc2/run2.2...
, but the globus transfer can do that as well. We don't need both. I could modify the tarballs to use a flatter structure and assume the globus transfer will create the lsstdesc-public/dc2/run2.2i-dr6-v2
area for the users. Then it will be up to them to extract that tarball in that same directory where the tarballs was initially stored. But I'm thinking to leave the tarballs as is so they will extract into the full directory structure by default. Sound ok?coadd-t3828-t3829-small.tar.gz
coadd-t3828-t3829.tar.gz
in the same directory on the portal, where -small
is the example subset of 1 patch. Is that going to be confusing in the case where a user downloads the full directory and receives both tarballs? The tarballs actually extract into separate directories, coadd-t3828-t3829
and coadd-t3828-t3829-small
. Alternatively, the small and full coadd data could be treated as completely separate datasets in the portal - though that might also be confusing. Ultimately, this will be explained in the README - so maybe that will help.
Ideas welcome!Is the tarball file much smaller? Globus supports transferring a whole directory so I wonder why tarballing is needed?
It's more a question of how to interact with the more complex directory structure in the portal. There is some slight savings of a few GBs by tar.gz. The tarball offers the ability to just choose the file and download it. So one gets everything or just that small 1 patch sample.
There is a desire to retain the general structure that the Gen2 LSST Sci Pipelines creates - as the requesting groups seem to have knowledge of the LSST Sci Pipelines. This means we have the separate deepCoadd
and deepCoadd-results
subdirectories with separate directories for each band/tract. What would it mean for a user to see the deepCoadd
and deepCoadd-results
directories and be given the option to grab one or the other or both? Or would we offer views deeper than that so people pick and choose certain bands or one tract or even some set of patches? I don't think we want to make this too complicated for these requests for access to coadd data.
I'm in favor of keeping the tarballs. The two options users would have - something very small or the whole thing - seem reasonable and easy to document. If we open it up so they can pick and choose some of them will get tangled up.
Thinking about Yao's words - I'm realizing he probably meant we could force users to choose either the full coadd directory or the small coadd directory and maybe it's possible to prevent them from drilling down further into the directory structure. I'd have to think about that to see if the web interface can be made to prevent drilling down into the coadd-t3828-t3829 folder and then the user just gets the whole directory structure.
Yes, @heather999, that's what I was thinking (allowing people to only choose either the full coadd directory or the small coadd directory, but not individual file within). Users will need to untar eventually and if using Globus I don't think a few GBs make a difference -- the time to untar will probably take longer.
Two updates:
I've made changes to documentation (branch issues/75/portal-doc) for the new GCRCatalogs version and acknowledgements, but nothing yet concerning the new image datasets. This will entail changes, mostly small, to several files.
Thanks @JoanneBogart! If you'd like me to start to do an early review, can you make it a draft PR? But I can also wait until you are ready, of course!
Ok - I've updated the dev portal to avoid browsing the coadd data and removed the tarballs. This will allow users to transfer either the full set of coadds for the 2 tracts (202 GB) or just grab that small sample (4.8 GB)
If we're generally happy with this arrangement of the coadd data, it might be time to ask Katrin to copy these 2 directories over to ANL.
@heather999 thanks for the update! This arrangement sounds good to me (which is obvious since I proposed it...)
I can make some UI/UX improvement regarding the deprecated catalogs, and also maybe make coadded images "clickable" but when clicked, prompt an instruction?
Should I work on the branch you are using?
@yymao I was expecting you to review after I had done the rest - I just thought it would be more efficient for you - but either way is fine. I can make a draft PR. Since Heather now has all data in place I will try to finish up the doc tomorrow.
@heather999 this is what I gather from remarks above:
Is that right?
Sorry been a crazy day.. going in order.. @yymao Yes, please go ahead with your plans and use the branch I've been working off of issue/75/DR6-v2 @JoanneBogart Right, and that single patch is 2,2 from tracts 3828 and 3829
@katrinheitmann if you or someone you can designate has time - we have that coadd data available for copying to the ANL endpoint in /global/cfs/cdirs/lsst/gsharing/lsstdesc-public/dc2/run2.2i-dr6-v2/coadd-t3828-t3829
and /global/cfs/cdirs/lsst/gsharing/lsstdesc-public/dc2/run2.2i-dr6-v2/coadd-t3828-t3829-small
Thanks @heather999 -- I'll work on issue/75/DR6-v2
tomorrow.
Working the dataset display -- Question: is there any reason we don't want people to browse the coadded image data sets? Now that we have the table in Appendix C, I can imagine some people may want to take a look at the data structure?
Done!
We have made a subsequent release of DR6-v2, which includes updated processing of one patch. This updated data should be made available through the data portal. Additionally, we will add coadd data for tracts 3828 and 3829.
To Do List