desihub / tutorials

DESI tutorials
BSD 3-Clause "New" or "Revised" License
42 stars 16 forks source link

getting_started tutorials and public data #79

Closed sbailey closed 1 year ago

sbailey commented 1 year ago

In our draft EDR Overview paper, we point people to the getting_started tutorials.

It looks like intro_to_DESI_EDR_files.ipynb is based entirely on soon-to-be-public data, good. I suggest that we save in github the fully rendered notebook including output cells and plots so that people can read along and see the results even if they don't have a NERSC account to actually run the notebook.

Fujilupe_AnalyzeZcat.ipynb, on the other hand, uses proprietary DESI data. Could it teach the same concept using only fuji data, and thus be useful as a public-facing tutorial as well? If it fundamentally needs to include guadalupe data, then let's update the introduction to make it clear that this uses non-public data.

I also see that Fujilupe_AnalyzeZcat.ipynb starts with /global/cfs/cdirs/desi/science/gqp/zcatalog_summary/zall-pix-fujilupe.fits which seems conceptually the same as fuji/zcatalog/zall-pix-fuji.fits plus guadalupe/zcatalog/zall-pix-guadalupe.fits, plus perhaps running the combination through desispec.zcatalog.find_primary_spectra. It would be preferable for tutorials to be based directly on specprod releases whenever possible.

@angelaberti @stephjuneau @Ragadeepika-Pucha @akremin @forero

stephjuneau commented 1 year ago

Further comments on getting_started tutorials (after the data telecon on 4/25/2023):

Did I miss anything? (Cc @sbailey @weaverba137)

forero commented 1 year ago

I've just pushed directly into main the fully rendered intro_to_DESI_EDR_files.ipynb

forero commented 1 year ago

I've just added https://github.com/desihub/tutorials/blob/pre-edr/getting_started/EDR_AnalyzeZcat.ipynb to replace Fujilupe_AnalyzeZcat.ipynb. The new notebooks only uses fuji/zcatalog/zall-pix-fuji.fits as a starting point.

stephjuneau commented 1 year ago

Hi @forero thanks for this! Can we move the Fujilupe_* one elsewhere for collaboration members and add a note that it requires NERSC access? That one could be augmented with a precursor notebook that Angela wrote but I wanted check on it again to avoid everyone re-writing a joint catalog which ends up taking extra disk space.

forero commented 1 year ago

Sure @stephjuneau, I will move the the Fujilupe_ notebook under onskydata/. Where can I find the precursor notebook that Angela wrote?

angelaberti commented 1 year ago

@forero the precursor notebook (Fujilupe_CombineZcat.ipynb) is in /global/cfs/cdirs/desi/users/aberti/public. I'm creating an EDR-only version of the Fujilupe_AnalyzeZcat.ipynb tutorial for the getting_started folder. The fuji-only version (from Jaime) is EDR_AnalyzeZcat.ipynb

weaverba137 commented 1 year ago

One item that we don't want to forget about: we need some instructions about how to set up a DESI notebook kernel. In principle this could be done with install_desi_kernel.sh, but I think it may be better to generate pre-made kernels for 22.5 and 23.1. Otherwise the public might start using unsupported kernels.

angelaberti commented 1 year ago

@weaverba137 by 'DESI notebook kernel' you mean something distinct from the 23.1 instructions within the notebooks written specifically for EDR (e.g. here)?

weaverba137 commented 1 year ago

No, I don't mean a separate kernel, sorry if that was confusing. There are a couple of things to consider here:

  1. I don't want someone following those instructions and then installing a kernel besides 22.5 or 23.1. Maybe I'm being overly cautious with that, but we shouldn't expect earlier releases to work, and cutting-edge versions such as main define environment variables that do not point into the public data release.
  2. A DESI kernel is activated by activate_desi_jupyter.sh, which does not have any provision for setting DESI_ROOT other than the default value of /global/cfs/cdirs/desi. We could update those instructions to set DESI_ROOT before desi_environment.sh is executed, but then install_jupyter_kernel.sh will still install a kernel that does not know to switch DESI_ROOT to /global/cfs/cdirs/desi/public/edr.

In some sense, both items are still an unresolved issue with desimodules, rather than with the tutorials themselves, but the tutorials will need to change to reflect the resolution of that issue.

forero commented 1 year ago

Noted, @weaverba137 . I will prepare some instructions on how a non-DESI NERSC user can use a DESI notebook kernel.

forero commented 1 year ago

@weaverba137

Currently, the tutorials in the getting_started section utilize hardcoded paths, as shown below:

specprod = "fuji"    # Internal name for the EDR
specprod_dir = "/global/cfs/cdirs/desi/public/edr/spectro/redux/fuji/"

As such, we are not making use of the DESI_ROOT environment variable within the Jupyter notebooks that are intended to link to the EDR.

In the global README for the tutorials repository, we mention:

NERSC users can install DESI Jupyter kernels from perlmutter-p1.nersc.gov (this is a one-time requirement) using the following:

source /global/common/software/desi/desi_environment.sh 23.1
$DESIMODULES/install_jupyter_kernel.sh 23.1
$DESIMODULES/install_jupyter_kernel.sh main

Given this, is it reasonable to assume that any NERSC user should be able to follow these instructions and successfully use the getting_started notebooks?

weaverba137 commented 1 year ago

No, because even though you are not using DESI_ROOT explicitly, many of the modules loaded as part of the kernel initialization process do use it, and the resulting environment variables will point outside of the public area.

Please do not include main! It definitely defines paths that point outside of the public area.

So overall, I think we're not there yet. This has to be resolved first in desimodules, as I already mentioned.

forero commented 1 year ago

I understand. (I removed the mention to main).

weaverba137 commented 1 year ago

Ah, but you could add a reference to 22.5.

forero commented 1 year ago

Just wanted to give you a heads-up that I've added a markdown file in the getting_started folder to mention the scripts and notebooks available in the edrpaper repo. Also, I've created a new issue at https://github.com/desihub/edrpaper/issues/1 as a reminder to switch the repo to public view.

forero commented 1 year ago

Hey @stephjuneau! I was wondering if it would be possible to include the notebook you used to plot the petals and healpixels in this repository?

forero commented 1 year ago

I am going to close this issue since all pending tasks are currently tracked in their individual open tickets.