Run L2 pipeline on a subset of DC1 data using v14_0 (or later)

jchiang87 commented 6 years ago

It would be useful to run the L2 pipeline on some of the DC1 imsim-dithered data using a more modern version of the Stack, e.g., v14_0. This will enable us to

use the new QA tools to see the differences between the old and new L2 outputs,
prepare for running the L2 pipeline on the protoDC2 and DC2 data.

heather999 commented 6 years ago

@tony-johnson do you have any words of wisdom about the L2 pipeline as it was run at NERSC for DC1? Pointers to existing docs in Confluence or other GitHub issues would be welcome. In the meantime, I'll see what I can dig up. This looks interesting: https://github.com/LSSTDESC/Pegasus_workflows

cwwalter commented 6 years ago

There are also options and pieces missing that either weren't in v13 or weren't turned on we should try to turn on now.

The deblendness variable was one. @SimonKrughoff can remind us what you have to turn on to get that.

@laurenam Also mentioned there were somethings we couldn't look at in her QA scripts because the code didn't exist in V13. We should find out what that is. I think we should consider rerunning DC1 (at least imSim dithered) with the updated version once we know how to do it.

jchiang87 commented 6 years ago

@heather999 @jamesp-epcc Here are some links to get you started:

The Twinkles L2 cookbook
The DC1 workflow scripts
The DC1-DM12 task at the SLAC workflow engine

It would probably be worth a zoom meeting to walk you through the SLAC workflow stuff if you haven't see it before.

heather999 commented 6 years ago

Thanks, @jchiang87 A zoom meeting would be great. Who do we need to attend? Are we enough? I figure you and James P have more difficult schedules to work around so feel free to name some times.

jamesp-epcc commented 6 years ago

A zoom meeting does sound a good idea. I could do late afternoon/evening (UK time) today or tomorrow but if that's too soon we can work something out next week.

heather999 commented 6 years ago

Today, tomorrow, Monday would work. Tuesday - Thurs I'm at Harvard, but I can work around that.

jchiang87 commented 6 years ago

@heather999 @jamesp-epcc The DC1 undithered eimage data are here at NERSC:

/global/projecta/projectdirs/lsst/production/DC1/DC1-imsim/full_focalplane_undithered

I would pick 5 or so visits in sequence and copy the data from the central raft (R_2_2) to local directory for ingesting via ingestSimImages.py and then run through the Twinkles cookbook as we discussed on zoom today. You should be able to use the same reference catalog file that worked for the QA plot testing.

jamesp-epcc commented 6 years ago

I have had a go at running the scripts on the L2demo data. I had trouble getting the ingest to work initially, it was throwing an error in a call to fcntl.flock, but running in $SCRATCH instead of $HOME seems to fix this. It must be related to the type of file system. (I also had to pass in `eimages/lsst.fits.gzrather thaneimages/lsst_.pyas in the run_L2.sh script). In the next stage (processEimage.py) I have run into a more serious problem. I get the error below: Traceback (most recent call last): File "/global/common/cori/contrib/lsst/lsstDM/w_2017_26_py3/stack/miniconda3-4.2.12-7c8e67/Linux64/obs_lsstSim/13.0-13-g177be6f+6/bin/processEimage.py", line 23, in from lsst.obs.lsstSim.processEimage import ProcessEimageTask File "/global/common/cori/contrib/lsst/lsstDM/w_2017_26_py3/stack/miniconda3-4.2.12-7c8e67/Linux64/obs_lsstSim/13.0-13-g177be6f+6/python/lsst/obs/lsstSim/init.py", line 27, in from .lsstSimMapper import * File "/global/common/cori/contrib/lsst/lsstDM/w_2017_26_py3/stack/miniconda3-4.2.12-7c8e67/Linux64/obs_lsstSim/13.0-13-g177be6f+6/python/lsst/obs/lsstSim/lsstSimMapper.py", line 1, in from builtins import map ImportError: No module named builtins Some Googling suggests that this might be fixed by installing the "future" package via pip, but I think that's something we should probably do centrally rather than me trying to get it to work with a local installation. There's also an earlier message:Could not import lsstcppimport; please ensure the base package has been built (not just setup).` but I can't tell if that's an error or just a warning.

jamesp-epcc commented 6 years ago

I've got a bit further with this now. It turns out the "future" package was already installed, but the processEimage.py script started with #!/usr/bin/env python2, so it was running in the default Python 2.7 instead of Python 3 with all the LSST packages. I fixed this by making a local copy and editing it. Now, processEimage.py runs and produces a lot of output files, but it isn't fully working yet. I'm getting the error Unable to find andConfig.py in astrometry_net_data directory. Reading the Twinkles cookbook, there is a section at the start on how to generate this file. However, I'm not sure how to apply this process to our data. The process appears to start from a reference catalog file and I don't think I currently have this.

SimonKrughoff commented 6 years ago

@cwwalter I just noticed I was at-ed here. Do you still need input from me?

cwwalter commented 6 years ago

There is a deblendeness variable we would like and wasn't available in v13. I remember you told me this is a different algorithm that needs to be turned on and it wouldn't happen by default. So, I was wondering what we had to do to turn that on.

BTW generically I wonder: how do we know we are turning on everything we need.

SimonKrughoff commented 6 years ago

I'm drawing a blank on the deblendedness front.

BTW generically I wonder: how do we know we are turning on everything we need.

The only way we can do this is to give catalogs to the working groups and see if they complain. I don't know another way. Obviously, we should show the config to interested parties to get feedback beforehand, but experience shows that we don't really know what we want until we get the data in hand.

cwwalter commented 6 years ago

The only way we can do this is to give catalogs to the working groups and see if they complain. I don't know another way. Obviously, we should show the config to interested parties to get feedback beforehand, but experience shows that we don't really know what we want until we get the data in hand.

People like @laurenam may also know what she needs turned on for her QA scripts.

BTW @laurenam do you remember about the deblendedness variable?

laurenam commented 6 years ago

I think base_Blendedness is what you’re after: https://github.com/lsst/meas_base/blob/master/python/lsst/meas/base/sfm.py#L133 This doc says v14 has it on by default: https://github.com/lsst/pipelines_lsst_io/blob/master/releases/note-source/v14_0.rst#blendedness-calculation-is-run-by-default

cwwalter commented 6 years ago

Thanks @laurenam!

LSSTDESC / SSim_DC1

Run L2 pipeline on a subset of DC1 data using v14_0 (or later) #47