LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Generate 2.1i truth catalogs (for objects not in cosmoDC2) #346

Closed yymao closed 4 years ago

yymao commented 5 years ago

We'll need to provide access to the 2.1i truth catalogs, but I am not sure what the process is (or if we can just follow what we did for 1.x). @danielsf, do you know?

danielsf commented 5 years ago

I'm not convinced we have yet decided what the Run2.1i truth catalogs should look like.

Ever since the great SED hack of September 2018, we have been in a position where nearly all of the truth information about galaxies is already in cosmoDC2. The only thing that comes to my mind that is not in cosmoDC2 is the amount of Milky Way dust towards any given galaxy. It should be straightforward to provide a lookup table for the amount of dust extinction in each LSST band for each galaxy.

There are about 20 million stars in DC2. There is, of course, no truth information for them in cosmoDC2, so we will have to generate some kind of external truth information for those.

There is also the question of transients and variables. There are 31 million AGN in cosmoDC2. The presence of these AGN will alter the quiescent magnitudes of their host galaxies. That might be an argument for duplicating information and creating a truth table of magnitudes for all galaxies. We can also create a lookup table associating galaxy_id with AGN properties.

We will also have to create a truth table of light curves for the variables and transients.

My first question is: how much awkwardness are we willing to tolerate? Is a scheme where we use cosmoDC2 as truth with supplementary data products for Milky Way dust, stars, and variables/transients acceptable or do we want all of the truth information in one monolithic catalog, even if it means duplicating information from cosmoDC2?

After that comes the question of what columns need to be included in the truth table? For 1.x we required magnitudes with and without dust extinction for ever source, but that was because we were in the weird state where ImSim included dust and PhoSim did not. I assume that we now only need 1) observed magnitudes 2) magnitudes without Milky Way dust (but with internal dust included)

Is there anything else beyond RA, Dec, magnitude (and MJD for light curves) that we want included ina separate truth table?

I am not sure what kind of truth information needs to be included for the strongly lensed systems added by the sprinkler (specifically those that use FITS postage stamps to produce realistic image profiles).

yymao commented 5 years ago

It sounds like, at least for the static part, we can use GCR to create a truth "catalog" without actually duplicating the underlying cosmoDC2 data. The reader can take of appending rows (for stars) and columns (for MW extinctions).

And we would still need to generate truth table of light curves for the variables and transients. But in this table, we don't need to duplicate galaxy information other than galaxy_id, I think?

cc @rmandelb to see if she has thoughts from the analysis perspective.

yymao commented 5 years ago

As discussed in #78, we still need to generate truth catalog for objects that are not in cosmoDC2 (e.g., stars, sprinkled objects), so we'll keep this issue open.

danielsf commented 5 years ago

I apologize for asking the same question a dozen times, but: how do we want to deal with the truth information for stars?

In CatSim, all stars have variability on some level (even if it is less than a millimag). In the reference catalog, we treated this by reporting the mean magnitude of the star and an rms in each band. Can we do the same thing in the truth catalog? Users could then use the rms column to investigate whether or not a source is significantly variable.

This then leads to the question of how do we want to deal with the variable part of the stellar truth table? Assuming we adopt the scheme above, it seems like we would want the variable truth catalog to just contain a light curve of all of the delta_magnitudes as a function of time (so: departures from the mean magnitude reported in the static truth table).

Does this seem like a reasonable API? It means that the stellar variability truth table would be slightly different than, say, the SNe truth table (since the mean magnitude of a supernova over the course of the survey is zero and there would be no need to compare the variability truth table to a static truth table).

Are we okay with that difference in API?

drphilmarshall commented 5 years ago

Seems sensible to me that all brightness quantities in the variable truth catalog would be delta mags. How are AGN treated? I would think that variable stars and AGN should be recorded the same way - and as offsets from the mean magnitude in the static truth catalog. I guess mainly I'd advise being consistent across all object types: delta mags for everything. Would it be painful to rename the SN columns, so that every brightness in the variable truth catalog was an offset from a mean?

danielsf commented 5 years ago

Nothing has been generated, yet, so there is no question of renaming anything.

Regarding AGN: since we are using cosmoDC2 directly as the static truth table for galaxies, I feel like the AGN light curves should be delta_mag away from the galaxy's mean magnitude, rather than from the AGN's mean magnitude, so that AGN light curves are constructed the same way as stellar light curves (i.e. as cosmoDC2_mag + delta_mag rather than as cosmoDC2_mag + AGN_mean_mag + delta_mag)

Any objections?

yymao commented 5 years ago

Maybe we need some input from transient people (cc @reneehlozek)? In particular, the change in API you proposed means that, to plot a light curve, the user must access boht the light curve table and the summary table (to get the mean magnitude). Not sure if that is too much hassle.

drphilmarshall commented 5 years ago

@danielsf If I understand you correctly, the cosmoDC2 mag is the magnitude of just the stars in the galaxy and not the AGN. That means that to emulate an Object magnitude (where the AGN and galaxy have not been deblended) you would start by doing cosmoDC2_mag + AGN_mean_mag (after overloading the + operator to combine magnitudes correctly) - that is, we need to track the AGN_mean_mag in any case. So, I think it will make most sense to users to have all the quantities in the variable truth catalog be deltas relative to the component's (star, AGN, etc) mean values. This would have the nice feature that all objects in the variable truth catalog would have light (etc) curves with baseline zero. So I think my vote is for AGN to have delta mag relative to AGN_mean_mag not its host galaxy mean mag. @yymao To your point: I guess I am saying that its more important to have a good logical set-up than it is to make life easy for the users. However, I don't know whether the hassle you are pointing out involves severe loss of computational efficiency as well as having to do an extra annoying table join.

danielsf commented 5 years ago

(Phil has understood me correctly)

yymao commented 5 years ago

@drphilmarshall Is AGN_mean_mag that you mentioned the mean magnitude of the light curve itself? Asking differently, if the light curve is stored in original magnitude, can one recover AGN_mean_mag from the light curve?

drphilmarshall commented 5 years ago

@yymao I would expect the AGN_mean_mag to be the mean (long-term time-averaged) magnitude of the AGN, ie a number like 23.3 . I would expect an emulation of a de-blended Object light curve to be the AGN_mean_mag plus the delta_AGN_mean_mag from the variable truth catalog (plus noise). I would expect an emulation of a DIAObject light curve to be just the delta_AGN_mean_mag plus noise (although a more accurate version would be to compute the reference image AGN magnitude and subtract that from the de-blended Object light curve).

If instead the true light curve was stored as the AGN total brightness then you could recover the AGN_mean_mag from the average of the whole curve, but only if "long-term" was defined to be "the length of the light curve in the truth catalog". In general, the 10-year mean magnitude of an AGN will not be quite the same as its "long-term time averaged" magnitude - I would interpret the latter quantity as long-term = infinite. (Its the baseline magnitude that you add DRW fluctuations to). I expect the galaxy/AGN model uses the long-term=infinite assumption when assigning mean magnitudes to AGN - so no, a 10-year light curve average would not (quite) recover the long-term mean AGN magnitude.

yymao commented 5 years ago

Thanks for the explanation @drphilmarshall. I think this means that for different types of variable objects, mag_mean would mean different things? And hence delta_mag would also mean different things for different types of variable objects. It might be ok but it might potentially cause confusion too?

drphilmarshall commented 5 years ago

I'm not sure it would, but it could be worth checking down the list of all variable and transient objects we have. For extragalactic SNe, the mean mag is effectively zero because the precursor star is negligibly faint; for other stars and AGN, I think using the long (infinite)-term time-averaged magnitude as the mean_mag will be intuitive. But, we should poll the SN group for agreement.

rmandelb commented 5 years ago

All: I am just catching up on this thread now (I was on vacation last week and 100% off e-mail, GitHub, etc. at the time, so missed the ping to me). It could be that discussion has moved elsewhere, in which case apologies for the noise, but it struck me that a number of distinct basic design questions were being asked re: stars, variable objects, and basic design of the catalog (duplication of information from cosmoDC2 etc.) and of course this thread will not naturally get relevant project teams to weigh in.

Could we imagine in the near term writing down a well-defined list of questions and inviting a few project teams to a DATF telecon or a DC2 telecon to try to hash out a design? I'm happy to suggest some project teams (and would ask Renee to suggest some as well) if people are comfortable with this approach.

wmwv commented 5 years ago

@rbiswas4 and @wmwv talked about this and agreed that the SN group doesn't really care about what aggregate lightcurve quantities are reported for SNe. They just don't really make sense and we wouldn't expect them to.

wmwv commented 5 years ago

For SNe, infinite time mean flux is 0 (mean mag is infinity). So delta flux with respect to that mean flux is just the flux of the lightcurve.

yymao commented 4 years ago

Truth catalogs for Run 2.x are available in PostgreSQL and also in GCRCatalogs (https://github.com/LSSTDESC/gcr-catalogs/pull/455) now.