astro-datalab / notebooks-latest

Default set of Data Lab notebooks, by DL team and contributed by users
BSD 3-Clause "New" or "Revised" License
60 stars 49 forks source link

DESI EDR Notebooks #175

Closed Ragadeepika-Pucha closed 1 year ago

Ragadeepika-Pucha commented 1 year ago

The notebooks currently work in db01 -- however, the Intro notebook still needs a DESI Kernel to work. Both the notebooks need to be tested after the staging is complete.

Ragadeepika-Pucha commented 1 year ago

@rnikutta @jacquesalice @stephjuneau - Thank you for the comments! I updated the notebooks based on all the suggestions.

jacquesalice commented 1 year ago

@Ragadeepika-Pucha can you please also resolve the "conflicts" that the new Emission Line Galaxies notebook has with the old one?

Ragadeepika-Pucha commented 1 year ago

@jacquesalice - I am not sure how to resolve the "conflicts" - I did a pull from the master - but it still says - "everything up to-date and nothing to commit". The instructions below show it needs to be merged to resolve it.

rnikutta commented 1 year ago

Hi @Ragadeepika-Pucha and @jacquesalice I'll resolve the conflict; it's from Igor's/mine fixes to the the deprecated np.warnings.filterwarnings().

rnikutta commented 1 year ago

@Ragadeepika-Pucha , @jacquesalice , @stephjuneau

A small snafu. I think the 03_ScienceExamples/EmLineGalaxies/01_EmLineGalaxies_SpectraStack.ipynb notebook uses 'id' in INCLUDE, but that was renamed, right? See in function get_sdss_spectra():

## Retrieve the spectra
    res = client.retrieve_by_specid(specid_list = [specobjid], 
                                    include = ['id', 'redshift', 'flux', 'wavelength', 'model', 'ivar'],
                                    dataset_list = ['SDSS-DR16', 'BOSS-DR16'])

and then the call gives:

wavelength, flux, model, ivar = get_sdss_spectra(specobjid[index], rest_frame = True)

BadInclude: [BADINCL] The INCLUDE list (flux,id,ivar,model,redshift,wavelength) contains invalid data field names for Data Sets (BOSS-DR16,SDSS-DR16). Unknown fields are: id.
Ragadeepika-Pucha commented 1 year ago

@rnikutta - I think SPARCL has changed since then. I will make the changes and commit them to this PR itself.

Ragadeepika-Pucha commented 1 year ago

@jacquesalice @rnikutta @stephjuneau - I updated both the emission-line notebooks to work with the latest SPARCL version - but, for the stacking notebook, when I am trying to get 100 spectra of SDSS sources, only >90% of the spectra are getting retrieved. Not all the spectra are getting retrieved from SDSS+BOSS. I am not able to think of any reason why that is.

Ragadeepika-Pucha commented 1 year ago

I updated the second emission-line galaxies notebook - the multiwavelength cutouts one - to work with the latest SPARCL version. However, the Spectral stacking notebook is still producing errors.

rnikutta commented 1 year ago

@Ragadeepika-Pucha Looks like the EmLines 01 NB (stacking) has the same issue in function get_sdss_spectra(), i.e. INCLUDE has 'id' in it.

rnikutta commented 1 year ago

@Ragadeepika-Pucha I'll log off for tonight. Maybe you can make the final fix tomorrow morning then? Thanks!

Ragadeepika-Pucha commented 1 year ago

Yes - but I fixed that. The issue now is that not all the spectra are getting retrieved for the stacking function. I am getting the following error:

image

The input is a table with 100 sources, but SPARCL is retrieving only 95-98 (based on my tests) spectra.

rnikutta commented 1 year ago

Unless I'm doing something wrong, on desi_edr_raga branch, the EmLine 01 stacking NB still shows 'id' in 'include=' in the get_sdss_spectra() function. (It is fixed to 'sparcl_id' in the 02 NB though. Also in function stack_spectra().

Regarding the 2nd error (after I changed both 'id' to 'sparcl_id'): it fails at index 337 (when starting with zero). Maybe for the given sparcl_id list, some spectra were missing or not retrieved?

stephjuneau commented 1 year ago

Hi @Ragadeepika-Pucha & @rnikutta Thanks for all your work last night!

During the last ingest, not all SDSS and BOSS records passed the criteria so we have an incomplete dataset for those (listed here). I have ideas for possible fixes but investigating and implementing those was postponed until after the DESI EDR to make sure that's ready. I suggest the following:

  1. Merge this PR for the DESI notebooks and the updated "id" field;
  2. Make a new branch/PR with a possible change to the Stacking notebooks between: (1) retrieve more than 100 (say 150) and then stack a subset of 100; or alternatively (2) check how many were retrieved and only loop over that number and print a message or "N=xx" on the plots to say how many are in each stack. In either scenario, we can point to not all spectra being ingested into SPARCL if they have data inconsistencies.

P.S. I suspect some of these issues were introduced when we switched from ingesting SDSS/BOSS from the individual spectral files to ingesting from the plate files and the bookkeeping might not be the same. We'll resolve this before the next SPARCL ingest (date TBD) but we should make a ticket to checking again the SDSS Stacking notebook when that is done (Cc @adambolton & @jacquesalice for information).

stephjuneau commented 1 year ago

I'm going to resolve conflicts and merge this one and I have found a fix for the EmLine SDSS Stacking notebook but I'd rather create its own branch+PR if that's OK? @jacquesalice or @rnikutta, do you agree?

rnikutta commented 1 year ago

Ok thanks @stephjuneau , I won't do the resolution then.

stephjuneau commented 1 year ago

@rnikutta I resolved the conflicts but Merging is still blocked by your requested changes although I think @Ragadeepika-Pucha did make the changes.

rnikutta commented 1 year ago

Ok I merged bypassing the outstanding review request. Can you please verify that the merge went through on Github etc.?

stephjuneau commented 1 year ago

@rnikutta The GitHub version looks good! Thanks so much!!