LSSTScienceCollaborations / StackClub

Learning the LSST software Stack, by writing jupyter notebook tutorials.
https://stackclub.readthedocs.io/
MIT License
45 stars 17 forks source link

Process CCD Notebook #34

Open kadrlica opened 6 years ago

kadrlica commented 6 years ago

This is mostly my attempt to run processEImage.py on the twinkles data. It has ended up not being as easy as I hoped.

drphilmarshall commented 6 years ago

"It has ended up not being as easy as I hoped."

First contender for Stack Club tagline? :-)

Do you want to submit a PR and use it to point out some sticking points? Might be good for all of us watching to appreciate the challenges. Thanks for taking a crack at this!

kadrlica commented 6 years ago

Current version of the notebook can be found here but not sure if it's ready for a PR. The notebook is not fully functional, and the github PR interface shows changes to the json, which I don't find especially enlightening (though maybe I'm missing something?).

Some of the sticking points:

drphilmarshall commented 6 years ago

Thanks Alex.

A couple of minutes runtime is awkwardly long for a live run through in front of an audience - but for a notebook that you work through at home I think it's fine (as long as the text tells you that you'll need to wait!). The other thing we could do is have on hand the results of a pre-baked run - which could be helpful for comparing results, but would also enable the user to skip the time-consuming part and get to the later section where the results of the task are displayed and explained.

Regarding PR threads for notebooks: I agree it's not ideal that we can only make line comments on the raw notebook code, but the PR thread does at least provide a "view" button where you can see the rendered notebook (so it saves you pasting links to the file in the branch). And then of course it enables the usual code review conversation in place, as well as tracking all the contributions made. I'd say that notebook was ready for PR no problem - you should just flag it as not yet ready to merge, in the initial comment. (Personally I like the workflow where you make the PR as soon as the dev branch is made in the base repo - so that that branch has an explanation for its existence.)

kadrlica commented 6 years ago

As far as I know, the twinkles and HSC data sets are the best curated. My personal thinking was that it was better to use simulated LSST images than real data from HSC. If we use the HSC dataset we should be able to follow the v15.0 tutorial line for line, but I'm not sure we want to do that.

The python3 compatibility is a one-line change to the shebang. I've already submitted an issue on this here. We also might be able to get away with something smaller than medium but I haven't checked.

SimonKrughoff commented 6 years ago

1) I think running on medium should be fine. The cluster is fairly large, so I don't think we'll run out of CPUs. I think we will learn a lot about how k8s packs jobs onto nodes and how to configure things correctly. 2) Diffing notebooks is a known issue. There is a package called nbdime that looks promising, but is not integrated into the github UI. 3) The python2 problem should just be fixed. I'll try to get to that tomorrow and then push it to the fork.

SimonKrughoff commented 6 years ago

I never got to 3 above, but it is on the branch that Heather Kelly is about to merge.

kadrlica commented 6 years ago

ProcessEImage is now broken with the following error:

---------------------------------------------------------------------------
ParentsMismatch                           Traceback (most recent call last)
/opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/daf_persistence/16.0/python/lsst/daf/persistence/butler.py in _setAndVerifyParentsLists(self, repoDataList)
    925                     try:
--> 926                         repoData.cfg.extendParents(parents)
    927                     except ParentsMismatch as e:

/opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/daf_persistence/16.0/python/lsst/daf/persistence/repositoryCfg.py in extendParents(self, newParents)
    181                                   "existing parents list in this RepositoryCfg: {}").format(
--> 182                                   newParents, self._parents))
    183 

ParentsMismatch: The beginning of the passed-in parents list: [RepositoryCfg(root='../../../../../DMStackClub/ImageProcessing/DATA', mapper=<class 'lsst.obs.lsstSim.lsstSimMapper.LsstSimMapper'>, mapperArgs={}, parents=[], policy=None), RepositoryCfg(root='../..', mapper=<class 'lsst.obs.lsstSim.lsstSimMapper.LsstSimMapper'>, mapperArgs={}, parents=[], policy=None)] does not match the existing parents list in this RepositoryCfg: [RepositoryCfg(root='/home/kadrlica/notebooks/DMStackClub/ImageProcessing/DATA', mapper=<class 'lsst.obs.lsstSim.lsstSimMapper.LsstSimMapper'>, mapperArgs={}, parents=[], policy=None)]
kadrlica commented 6 years ago

I've decided to give up on ProcessEimage and move to ProcessCcd with the HSC data.

kadrlica commented 6 years ago

Simon suggests that some modifications of configuration parameters could be useful:

He also suggests looking at Jim Bosch's Leon notebook:

kadrlica commented 6 years ago

He also suggests Jonathan's documentation on configuration: