RQ: Twinkles 1 Scope Discussion

drphilmarshall commented 8 years ago

@wmwv @cdfassnacht @saurabhwjha and @DarkEnergyScienceCollaboration/twinkles

In the SRM we hazarded a guess at what an appropriate division between Twinkles 1 and 2 should be, and came up with something like "short, single filter" vs "long, multi-filter". Let's revisit this as we write down our science analysis plans for Twinkles 1.

Some things to discuss:

The workflow team are going great guns, and looking for directions to extend in. Choices seem to be: 1) scaling up to more visits, 2) getting the SN and lens systems right and in place, 3) going multi-filter, 4) implementing difference imaging and DIASource extraction. I suggested that 4) could wait, 2) was urgent, 1) was an easy one to try anyway, and 3) was well worth experimenting with now to see how hard it was. Thoughts?
If we are going to be able to measure lens time delays (ie, use time delay measurability as a metric of light curve extraction success), then we need both few day cadence (which means multi-filter, if all our visit MJDs are coming from OpSim) and at least 3 seasons. How many visits (roughly) are in 3 seasons of DDF (including the WFD visits)? Would this be sufficient for us to write an interesting SN paper? I think it would be for an SL analysis paper. What do you reckon?
I got the impression that going multi-filter may not be actually be too difficult - which made me wonder whether the more appropriate division between Twinkles 1 and Twinkles 2 might be "easy" vs "hard" instead of "short, single filter" vs "long, multi-filter". In #63 Jim points out that it doesn't make sense to simulate effects that are not modeled by DM: if we were to ignore these effects (big dithers, heterogeneous focal plane, clouds, flat fielding, edge roll-off, more?) in Twinkles 1, then the approximation we'd be making is that we can account for those effects perfectly. This seems to be a good approach to me. So Twinkles 2 could be 10 years instead of 3, and all observational effects not included in Twinkles 1.

saurabhwjha commented 8 years ago

If you go to multi-filter for Twinkles 1, then for SN yes I think it would be very useful to go to DDF (or higher: nightly) cadence as well. From this you could simulate WFD cadence with sampling, but you wouldn’t be able to go the other way.

On Dec 4, 2015, at 6:49 pm, Phil Marshall notifications@github.com wrote:

@wmwv @cdfassnacht @saurabhwjha and @DarkEnergyScienceCollaboration/twinkles

In the SRM we hazarded a guess at what an appropriate division between Twinkles 1 and 2 should be, and came up with something like "short, single filter" vs "long, multi-filter". Let's revisit this as we write down our science analysis plans for Twinkles 1.

Some things to discuss:

• The workflow team are going great guns, and looking for directions to extend in. Choices seem to be: 1) scaling up to more visits, 2) getting the SN and lens systems right and in place, 3) going multi-filter, 4) implementing difference imaging and DIASource extraction. I suggested that 4) could wait, 2) was urgent, 1) was an easy one to try anyway, and 3) was well worth experimenting with now to see how hard it was. Thoughts?

• If we are going to be able to measure lens time delays (ie, use time delay measurability as a metric of light curve extraction success), then we need both few day cadence (which means multi-filter, if all our visit MJDs are coming from OpSim) and at least 3 seasons. How many visits (roughly) are in 3 seasons of DDF (including the WFD visits)? Would this be sufficient for us to write an interesting SN paper? I think it would be for an SL analysis paper. What do you reckon?

• I got the impression that going multi-filter may not be actually be too difficult - which made me wonder whether the more appropriate division between Twinkles 1 and Twinkles 2 might be "easy" vs "hard" instead of "short, single filter" vs "long, multi-filter". In #63 Jim points out that it doesn't make sense to simulate effects that are not modeled by DM: if we were to ignore these effects (big dithers, heterogeneous focal plane, clouds, flat fielding, edge roll-off, more?) in Twinkles 1, then the approximation we'd be making is that we can account for those effects perfectly. This seems to be a good approach to me. So Twinkles 2 could be 10 years instead of 3, and all observational effects not included in Twinkles 1.

— Reply to this email directly or view it on GitHub.

rbiswas4 commented 8 years ago

@drphilmarshall Is there a rationale for going single filter in Twinkles 1?

@jbkalmbach and I went through the sprinkler setup that he has coded up to try to understand how SN fits in, and then going through OM10. I think in order to start this correctly,

It would be good to have hard priors on the time delay distributions. Fig 8 seems to suggest a hard prior of abs(time delay) < 2 or 3 years should be good for SN. Would that sound reasonable? Also, I was intrigued by the differences in SN and QSOs and expect that this is a selection effect based on numbers and redshift distributions.
Also, it would be useful to understand the plan of placing lenses from OM10 in a small patch of sky. I understand that there is a plan of parametrizably oversampling lenses or transients or both in the small patch of sky. I think having a rough idea of the plan may be useful for doing 2, but I have not followed the discussion on these closely enough. Is there a particular document I should look at?

wmwv commented 8 years ago

To me the appeal of single-filter for Twinkles 1 is easier bookkeeping both at the image level and at the source association level. Neither of these tools is well-developed within the LSST DM framework so we'll be writing our own. I would suggest that we keep Run 1 as simple as interesting to increase the chances of having something interesting by the March meeting. I think single-filter is still interesting.

But I don't feel strongly about this.

rbiswas4 commented 8 years ago

I see. Source association from the images seems likely to one of the harder steps, and I don't have a much of an idea of whether the multi-filter component makes it harder than the usual multi source (single filter) issue.

If you only plan single filter visits, do you want to start with simulations in only the OpSim visits to the chosen filter (thereby reducing the cadence of visits to the object), or do you need all visits mapped to a single band (ie. even though OpSim may have visited the object in a band different from the chosen one, get all the information as though the visit was in the chosen one).

We could of course propagate the simulation truths on the object id forward to cheat after our actual association to do a pseudo association step so that we can build and test the following steps in the pipeline in multi-filter space.

drphilmarshall commented 8 years ago

Mapping all observations into a single filter is the approximation we made in the time delay challenge - it would allow us to make interesting comparisons with those results. Does the same approximation make sense for supernova analysis? Maybe for Run 1 it does - but then, when I think about what we should aim for before we show stuff in March, I can't help feeling multi-filter is essential for our DESC street cred.

I'm surprised that multi-filter book-keeping is not well developed in DM. Can the butler really not cope with different types of wine? I guess this might be a lot of what we learn by giving it a try. In particular, Michael, what have you found to be the short coming? Jim seemed pretty optimistic about it, when we talked I thought.

On Mon, Dec 7, 2015 at 8:58 AM, rbiswas4 notifications@github.com wrote:

I see. Source association from the images seems likely to one of the harder steps, and I don't have a much of an idea of whether the multi-filter component makes it harder than the usual multi source (single filter) issue.

If you only plan single filter visits, do you want to start with simulations in only the OpSim visits to the chosen filter (thereby reducing the cadence of visits to the object), or do you need all visits mapped to a single band (ie. even though OpSim may have visited the object in a band different from the chosen one, get all the information as though the visit was in the chosen one).

We could of course propagate the simulation truths on the object id forward to cheat after our actual association to do a pseudo association step so that we can build and test the following steps in the pipeline in multi-filter space.

— Reply to this email directly or view it on GitHub https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/64#issuecomment-162591791 .

cwwalter commented 8 years ago

I can't help feeling multi-filter is essential for our DESC street cred.

and now I can't help visualizing some sort of "West Side Story" SL vs WL dance number.

wmwv commented 8 years ago

Strongly lensed Jets?

wmwv commented 8 years ago

The butler is completely capable of generating and handling the data, both imaging and output catalogs.

I'm concerned about comparisons across the output catalogs. There's no DM-level association code that has been advertised as being recommended.

Single-filter is a bit easier because you can just assume that for equal sensitivity you probably should have detected everything (stuff around the noise level will pop in/out of the catalogs). If you compare z-band and u-band, you'll have significantly different sets of objects -- I would be fine with just using the r-band catalog for object initialization for the association to get started.

wmwv commented 8 years ago

But I had also been basing my original thinking on a faked OpSim catalog. If we're using a real OpSim catalog, it may introducing annoying confusion to either only do r-band (not enough sampling to be interesting) or to always observe in r, even if the OpSim run specifies something else.

drphilmarshall commented 8 years ago

Ah, OK - I have been assuming that a CoaddSource was somehow intrinsically multi-filter entity, in that once it had been detected, it would have forced photometry performed on it in all visit images regardless of filter. Then, constructing a multi-filter light curve would just involve a simple query by CoaddSourceId.

However, LDM-151 (page 20) says otherwise! But, I see that "The next stage in the pipeline will decompose the CoaddSources into a set of individual astronomical sources which is consistent across all bands, a process known as deblending." The output from this operation would be a set of Objects, and it's these that the ForcedPhotometry will be run on. So this sounds as though Objects will indeed be somehow multi-filter entities, and I guess this is where my assumption about ForcedPhotometry had come from.

So now, what does the current deblender do? Does it make Objects that are composed of CoaddSources in all filters? And can ForcedPhotometry then be straightforwardly run on such a multi-filter object to make multi-filter light curves in the SN-required format?

I guess this means that the emulated pipeline in our cookbook is missing a step - the deblending that turns CoaddSources into Objects. Do I have all this right, Simon? Would it be straightforward to add in the current deblender?

On Mon, Dec 7, 2015 at 10:50 AM, Michael Wood-Vasey < notifications@github.com> wrote:

The butler is completely capable of generating and handling the data, both imaging and output catalogs.

I'm concerned about comparisons across the output catalogs. There's no DM-level association code that has been advertised as being recommended.

Single-filter is a bit easier because you can just assume that for equal sensitivity you probably should have detected everything (stuff around the noise level will pop in/out of the catalogs). If you compare z-band and u-band, you'll have significantly different sets of objects -- I would be fine with just using the r-band catalog for object initialization for the association to get started.

— Reply to this email directly or view it on GitHub https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/64#issuecomment-162622096 .

wmwv commented 8 years ago

On Dec 7, 2015, at 14:16, Phil Marshall notifications@github.com wrote:

Ah, OK - I have been assuming that a CoaddSource was somehow intrinsically multi-filter entity, in that once it had been detected, it would have forced photometry performed on it in all visit images regardless of filter. Then, constructing a multi-filter light curve would just involve a simple query by CoaddSourceId.

Yes, but all this machinery doesn't exist yet.

Michael

rbiswas4 commented 8 years ago

Does the same approximation make sense for supernova analysis?

One of the things I was trying to figure out is how important the standard candle nature of SN is for results downstream. OM10 seems to suggest that this is a useful feature, but I did not look closely enough to see how important. I think multiband would be useful to bring down error bars on the distance modulus/intrinsic brightness, assuming you plan to propagate to that stage.

My main worry about single filter is that we will not get a good idea of the impact of cadence. That said, given that this is a deep field, it may not even be worth worrying about.

On the other hand, what @wmwv is suggesting (mapping all visits to r band) assuming the noise level in the band visited is obviously easy.

Maybe for Run 1 it does - but then, when I think about what we should aim for before we show stuff in March, I can't help feeling multi-filter is essential for our DESC street cred.

I think it is good that we have identified a single step where we think multi-filter is likely to be a bottleneck. Maybe there are others that will be harder than we envisage at present. So, what I feel like is the following:

Try multi-filter throughout
Have a black box that does association, where we try to develop as best we can, but it is ok if we only get it to work in single filter. But then we have a cheat mode (which we should have anyway in every step to enable analysis of how well it worked), where we use truth information from sims to do much more trivially the association from different bands. This way, we can keep most of the analysis multi-band, and know how well the association analysis does too.

Maybe there are difficulties here that I am missing.

SimonKrughoff commented 8 years ago

@wmwv @drphilmarshall I believe the coadd sources are multi-band and deblended. I think making multiband lightcurves of deblended objects should be doable now. I can try to do some experiments to confirm this.

drphilmarshall commented 8 years ago

Alright! Game on, Simon :-) Thanks ver y much - looking forward to seeing what comes out.

On Tue, Dec 8, 2015 at 11:16 AM, SimonKrughoff notifications@github.com wrote:

@wmwv https://github.com/wmwv @drphilmarshall https://github.com/drphilmarshall I believe the coadd sources are multi-band and deblended, I believe. I think making multiband lightcurves of deblended objects should be doable now. I can try to do some experiments to confirm this.

— Reply to this email directly or view it on GitHub https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/64#issuecomment-162986555 .

sethdigel commented 8 years ago

We could of course propagate the simulation truths on the object id forward to cheat after our actual association to do a pseudo association step so that we can build and test the following steps in the pipeline in multi-filter space.

Getting back to this point made by @rbiswas4, having a 'cheat mode' capability sounds very important. Even if the association methods between images in different filters currently work well, being able to map the detected sources to the truth will be useful for any number of assessments of analyses of Twinkles simulations. That said, is it straightforward to propagate the simulation truth information to the detected objects? This sounds like a challenge in itself.

drphilmarshall commented 8 years ago

Good question. Rahul, has there been any progress on this? Related to "cheat mode" is "blind mode" - it could be useful to enable holdback of some fraction of the data, to enable blind or cross-validation analysis.

More generally, and related to the scope of Twinkles 1, I added a slide to the set for today's meeting on "Needed Capabilities" - please add thoughts of your own to my short list!

On Wed, Feb 10, 2016 at 11:32 PM, Seth Digel notifications@github.com wrote:

We could of course propagate the simulation truths on the object id forward to cheat after our actual association to do a pseudo association step so that we can build and test the following steps in the pipeline in multi-filter space. Getting back to this point made by @rbiswas4 https://github.com/rbiswas4, having a 'cheat mode' capability sounds very important. Even if the association methods between images in different filters currently work well, being able to map the detected sources to the truth will be useful for any number of assessments of analyses of Twinkles simulations. That said, is it straightforward to propagate the simulation truth information to the detected objects? This sounds like a challenge in itself.

— Reply to this email directly or view it on GitHub https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/64#issuecomment-182747567 .

rbiswas4 commented 8 years ago

The context in which I made the previous comment was for source association, after which one could run forced photometry etc. The PhoSim catalogs come with a centroidfile (via an option in the override that Simon showed me) that contains sourceIDs of each source in catalog. So, we could group the images that have these sources together by this, and then do the next step. Obviously, enabling more such modes for example to find out which photon is from which object is also amazingly useful, but much harder, and I was not thinking about those uses when I made that comment.

sethdigel commented 8 years ago

Thanks, Rahul. Yes, I was thinking about 'cheat mode' only in the context of knowing the true identity of each detected source.

So is it true that the same kind of source association step that will associate sources detected in different Run 1 exposures/filters could also be used to match the centroidfile sources to sources detected in the different images? I suppose that the centroidfile sources would have zero position uncertainty - possibly that would have to be changed to something finite for matching up with detected sources. There's probably some unavoidable ambiguity (a true source could be matched with detections in different images that do not get associated with each other), and I suppose that for time-varying sources we'd want to enforce that they do not get matched up with something detected in the simulated images when there's no chance that they could be detected.

LSSTDESC / Twinkles

RQ: Twinkles 1 Scope Discussion #64