Run 2.0i: galaxy saturation due to internal galaxy extinction

rmandelb commented 5 years ago

This weekend some test images that were produced in preparation for Run 2.0i revealed some problems - see y-band image below:

v12365_r22_s10_y_knots1_gal2

This issue is about the galaxy that I’ve circled here, which as you can see is saturated. The magnitudes in the instance catalogs indicate that none of the galaxy components should saturate; there is no AGN or overlapping star either, and no issue with the knots model. Eventually we traced this problem to a combination of two factors:

this galaxy has Rv=0.49 (note that for Rv<1, the internal dust extinction law brightens the SED at some wavelengths)
the galaxy has an extremely large value of Av=48, which greatly enhances effect (1).

Furthermore, Scott @danielsf has found that with Rv<0.1, the internal dust law can correspond to brightening by a very substantial factor for some wavelengths, while for Rv in the range [0.1,1], it’s a problem only if Av is large and positive. Av<0 also has the potential to become somewhat of a problem depending on the Rv value..

Unfortunately, the range of Rv in the catalogs is quite broad, with the majority of objects in that [0.1,1] range and a non-negligible fraction below that; see this histogram of Rv for disks for one instance catalog:

Based on some discussion with @egawiser and @danielsf , we agreed to make the following changes while post processing of the instance catalogs (using the same script used for issue #265):

Clip bulge and disk Av values at a minimum of zero (i.e., set the negative values of Av to 0).
Values of bulge or disk Rv below 0.1 get set to 0.1, while clipping Av to a maximum value of 1 for those galaxies (set Av values > 1 to 1 for galaxies in this Rv range).
For bulge and disk Rv values in the range [0.1,1], leave Rv as-is and clip Av to a maximum value of 1.

We do not modify galaxies with Av>0 and (Rv>1 OR (Rv in [0.1,1] and Av<1)).

In practice, about 20% of disks and 10% of bulges are modified by these cuts. For Av, most objects are near the cut values (e.g., negative Av values tend to be around -0.005). For the problem case in the image above, the fact that our clipping process changes Av=48 to Av=1 will avoid this saturation issue.

While this change clearly fixes some of the problems that can arise due to (Av, Rv) combinations that result in unphysical amplification of the galaxy fluxes, it was not possible on short notice to investigate the result in terms of color changes in the overall galaxy population. A subset of the bulges/disks we modified are below the detection limit, but still, the impact on the observed galaxy colors more broadly should be checked.

I believe @danielsf had some suggestions for more tests we could do today. I hope he will explain them here. Input from others is welcome! I would be interested in hearing from @aphearin about the impact on observed galaxy colors as well.

erykoff commented 5 years ago

I find the order of operations very confusing, and was wondering if it's written down somewhere. As far as I know, as an end-user of the CosmoSim catalogs (well, much more use of the ProtoDC2 catalogs) is that the fluxes that are reported include the host reddening. And this is (after some early hiccups) what the validations are based on, and in this space, nothing wacky like this is happening.
For whatever reason, the imsim input for the instance catalogs are the numbers without host reddening (I guess because of the bulge/disk decomposition?) and then some other software applies this host reddening before making the images. Is this correct? If this is the case, have the fluxes that are actually simulated in the frames been validated against those that are in the CosmoSim catalog that went through DC2 validation? If this is not the case, what is the order of operations? Either way, is it not the case that we do have a mismatch between the validated fluxes and the simulated fluxes?

danielsf commented 5 years ago

The reason fluxes in the InstanceCatalog are not the same as fluxes in the extragalactic catalog is that PhoSim requires an SED and a normalizing magnitude in order to produce an image. The extragalactic catalog does not come with SEDs, so we use the extragalactic catalog's magnitudes to do a color-color space fit to our library of SEDs. This gets pretty close, but it is not going to give the exact same magnitudes as in the extragalactic catalog. Furthermore, PhoSim expects an unreddened SED to which it applies reddening based on the Av and Rv values you supply.

rmandelb commented 5 years ago

@erykoff -

I find the order of operations very confusing

That's because it IS confusing.

Your questions are precisely the reason that I am somewhat uneasy about the potential for inconsistency between the images and the DESCQA tests of the extragalactic catalogs for the subset of the objects that are getting modified. Unfortunately I will have to ask @yymao @evevkovacs @aphearin to clarify/confirm any statements about what's happening in the DESCQA tests and @danielsf to clarify/confirm regarding the instance catalogs and how they are used, so this message can start as at least a starting point for a response but does not fully address your questions.

Here is what I think is going on:

the DESCQA tests are done on fluxes that include the host reddening. I believe that for all tests that are not explicitly about bulge or disk components, the reported fluxes are those for the galaxy overall. I would like Yao, Eve, or Andrew to confirm this statement.
for the image simulations (whether PhoSim or ImSim), we have separate bulge and disk components that have their own separate SEDs and internal reddening laws. This is necessary to draw chromatic multi-component galaxies. Unlike the MW extinction, which is fixed to a perhaps excessively well-behaved Rv=3.1, the internal Rv values for these components can have a wide distribution.
It is not clear to me (but Scott's comment above speaks to this point) to what extent the fluxes from the "total galaxy SED and internal reddening" has been compared against the fluxes from the combination of "bulge SED and bulge internal reddening" and "disk SED and disk internal reddening". In principle, as I said, I believe the first of these is relevant to what gets reported in the DESCQA tests we've been looking at, while the second of these is relevant to what will show up in ImSim and PhoSim images. And I assume they have been compared/tested but I do not believe I've seen plots of that. So that is one potential disconnect that is unrelated to the changes we are introducing now in the instance catalog.
and yes, by modifying the instance catalogs to avoid bad behavior due to the wide range of (Rv, Av) combinations for individual components of galaxies, we could potentially be further modifying some flux/color distributions compared to what went into successful DESCQA tests.

erykoff commented 5 years ago

Thanks, this is starting to make more sense. Though I don't know if I fully agree with the statement that "So that is one potential disconnect that is unrelated to the changes we are introducing now in the instance catalog." My fundamental question is that if we have validated the "total galaxy SED + internal reddening" vs "bulge+reddening + disk+reddening" comparisons, then how are these modifications to the instance catalogs even necessary? And if we haven't validated these comparisons, then that is a potential major disconnect between DESCQA and what we are simulating.

yymao commented 5 years ago

To confirm/supply more info for @rmandelb's comment above:

Yes, DESCQA tests are done on total fluxes that include host reddening. We do have one test (size-luminosity) that uses disk/bulge information, but other than that the fluxes include both disk and bulge, and always include host reddening.
As @danielsf pointed out, due to SED fitting, the fluxes in the instance catalog differs from those in the extragalactic catalog. We have a DESCQA test that shows the difference. See here for an example for Run 1.1 / protoDC2 2.1.2.
I believe We can replace offending Rv, Av and also update the fluxes at the reader level for extragalactic catalogs, if needed. But right now the fix is applied at the instance catalog level, without updating the fluxes (only fixing Rv, Av).

aphearin commented 5 years ago

the DESCQA tests are done on fluxes that include the host reddening. @rmandelb - yes, this true. All color validations are done using magnitudes that have been reddened by dust in the host galaxy. Moreover, when we have eliminated galaxies from the Galacticus library on the grounds of unphysical reddening, we have only done so using the value of Av and Rv on the total flux, not on the disk/bulge decomposition.

I have started to study the impact of the imputation described in the bullet points above in the first post to this thread. The first thing to check is just to see which and how many galaxies are impacted. Here is my first plot. Many more to come, but it's a busy day so progress will be intermittent.

av_cut_impact

On the y-axis I show the fraction of objects that fail the cut indicated in the figure title. In this particular plot, a galaxy fails the mask if either of its disk or bulge components fails the mask. The gray band guides the eye to the 10% fraction.

evevkovacs commented 5 years ago

@rmandelb @danielsf Several comments:

I have now added separate checks for the disk and bulge Av and Rv to the readiness tests. Previously, only Av and Rv for the total (sum of disk and bulge) were checked, and Av for the total had no large outliers. However, if one uses the component fluxes, the values are much less stable. For example, the disk component in Galacticus can become very small or zero. I believe that this kind of edge case is responsible for the fluctuations described above, but haven't checked this in detail.
DESCQA tests have validated the "total galaxy SED + internal reddening" fluxes in so far as the validation criteria have tested the luminosity functions and color distributions. The tests have concentrated on the total fluxes for LSST and SDSS filters. The only validation tests that have looked at properties of the separate galaxy components are the size tests. These tests validate the size of the disk and bulge as a function of the total luminosity. There are no tests for "bulge+reddening or disk+reddening" fluxes. We have no validation criteria for these from the WGs
The dust model used in the image simulations is different from the dust model used by Galacticus. To my knowledge, the two models have not been compared against each other. I think that the truth catalog should have enough information to make this comparison, if the reddened fluxes computed by CatSim were included in the truth information. @danielsf ?
There is now a disconnect between the extra-galactic catalog and the image simulations for the galaxies with modified Av and Rv. The galaxies with modified Av and Rv in the image simulations no longer have a self-consistent set of SEDs and associated LSST-, SDSS, B and V band fluxes (ie flux values in the image simulations have not been adjusted so they are consistent with the Av and Rv values used).

erykoff commented 5 years ago

@yymao re: your point 2: that test shows the difference that the host reddening makes, which is quite large (1 mag from largest to smallest!). But I think what we need is a validation that what the image generation code is making (which is outside the DESCQA framework) matches what is going on internally in the sims (which DESCQA has access to). Is the crazy saturation here just the tip of the iceberg?

erykoff commented 5 years ago

@evevkovacs Re your point 2 "We have no validation criteria for these from the WGs", I think the validation criteria is "bulge+reddening + disk+reddening == total observed flux". That is, if the sum of the components doesn't equal to the total, then where are we? What is there to validate? Re point 3: "the two models have not been compared to each other", I think that this is the crux of the matter. That means we aren't simulating what has actually been validated in DESCQA.

evevkovacs commented 5 years ago

@rmandelb @yymao @erykoff @aphearin I think we need a whole slew of tests/checks to understand how best to deal with these issues. Some of these Andrew has already started. Here's a list:

Checks to understand the outlier Av, Rv cases for disk and bulge components for the extra-galactic catalog
- Validation tests for disk and bulge component fluxes. Since CatSim is applying its own reddening model, I assume that has been validated elsewhere, so the information for these tests already exists in some form.
- Comparison tests for reddened fluxes in instance and extra-galactic catalogs, so we know how different they actually are.

rmandelb commented 5 years ago

Just to close the loop on one thing, @evevkovacs said

The dust model used in the image simulations is different from the dust model used by Galacticus. To my knowledge, the two models have not been compared against each other. I think that the truth catalog should have enough information to make this comparison, if the reddened fluxes computed by CatSim were included in the truth information.

But I think that what @yymao sent a link to is exactly that comparison? (at the level of comparing fluxes, not comparing the dust model) Or have I misunderstood what Yao sent? I am trying to make sure I have the pieces together. Because if Yao sent is what I thought, then I think we do not need an "extragalactic catalog vs. original instance catalog" comparison, just a "original vs. new instance catalog" comparison. I wonder if @danielsf might have thoughts on the cleanest way to do that? The instance catalogs were not replaced (the original ones still exist) and so some comparison based on a simulation utility reading in the old vs. new ones might be illuminating here. Perhaps we could focus on those that have i<~27 and hence we expect to be detected in the 10-year coadds.

As I mentioned above, we do already have an estimate of the bulges and disks that are affected (~10% and 20% of the entries in the instance catalogs, respectively). I believe @aphearin 's plot suggests that the really faint galaxy populations at higher redshift might be disproportionately affected, which would be good news since some of them are below the detection threshold. Edited: note also that "affected" might mean "Av changed from -0.001 to 0" which is not a noticeable change in observable properties either.

aphearin commented 5 years ago

Here is the analogous plot for the Rv cut

rv_cut_impact

aphearin commented 5 years ago

These two plots together make me somewhat relieved about the net effect of the imputation described above. Comments welcome, ditto suggestions for further initial tests.

evevkovacs commented 5 years ago

@rmandelb The plots Yao pointed to are called diff_04_extragalactic_mag_true_x_lsst_no_host_extinction_truth_mag_true_x.png, with x=u,g,r,iz,.. I read this as a comparison of the unreddened fluxes in the extra-galactic catalog and the unreddened fluxes in the truth catalog. The latter are obtained by the CatSim re-fit to the SED's in the extra-galactic catalog. @yymao should correct me, if I have not understood the definition of the quantity "truth_mag_true_x".

yymao commented 5 years ago

@rmandelb yes and no, the comparison I mentioned earlier shows the difference in flux between the "truth catalog" (which is generated by CatSim) and the extragalactic catalog.

Nominally, the fluxes we would compare should include host reddening. However, for this particular Run 1.1 truth catalog, its fluxes do not include host reddening, and hence the difference you see in that particular test comes from the SED fitting. However, the plan is to include host reddening in future truth catalogs, and then this test would be what @rmandelb described (comparing two dust models at the level of comparing fluxes).

rmandelb commented 5 years ago

Thanks @yymao and @evevkovacs for clarifying what was in those plots. So it sounds like it is fair to say that the impact of different galacticus vs. catsim SED libraries on the unreddened magnitudes has been checked, but the impact of the combination of "different SED and different dust libraries" on the magnitudes with internal extinction has not been explicitly checked? And that is a somewhat independent question from the question of what those (Av,Rv) remappings are doing to the observed fluxes and colors.

yymao commented 5 years ago

Yes, @rmandelb, I think your statement above is accurate.

I may add that originally we did plan to check "different SED and different dust libraries" by comparing the truth catalog and the extragalactic catalog, but then it was decided there was not enough time to generate Run 2.0 truth catalog to complete this check.

In principle, the fluxes in the Run 2.0 truth catalog (which will include reddening) would be the closest to observed fluxes, and hence we can run several DESCQA tests on the truth catalog to gain a better idea about the observed fluxes and colors.

rmandelb commented 5 years ago

Right - the question is can we make a Run 2.0 truth catalog for some very limited area quickly, even if we can't make a full truth catalog? Can you please remind me what are the inputs to the truth catalog generation process? (instance catalogs?) And what are the constraints on how/when we can do this, our ability to do it quickly for some limited area? (Is the code in place and it's "just" a matter of running some jobs?)

yymao commented 5 years ago

I'll let @danielsf to answer this question as he knows the situation much better than I do.

rmandelb commented 5 years ago

@aphearin - thank you for the plots. Can you please correct me if I am wrong, but I think your plots show what fraction of objects (as a function of stellar mass and redshift) meet the criteria to be clipped in Av (at zero) or Rv (at 0.1)? If that's correct, then we might also need one more for the final type of clipping: the case where the galaxy is in the Rv range [0.1,1] and the Av has to be clipped at a maximum value of 1?

So far the trends look encouraging: it's a small number of objects (consistent with what Francois found yesterday), and preferentially low stellar mass / high redshift ones that would not be visible (which is more info beyond what Francois was able to test). If we see the same trend for that last type of clipping, that would be really great. And I think that's probably just about as far as we can go without doing some tests of the impact of this clipping on the observed magnitudes, which can't be done with the extragalactic catalogs.

danielsf commented 5 years ago

The truth catalog generator does not rely on InstanceCatalogs. It will read from the extragalactic catalog directly. I would have to add code to it to apply Rachel's new dust values, but that should be the only work involved in creating a truth catalog for a limited cosmoDC2 area.

rmandelb commented 5 years ago

Thanks @danielsf - I think you can copy Francois's code from https://github.com/LSSTDESC/sims_GCRCatSimInterface/blob/a6e60cf8a33fbecd177a81296fc1a3e1ff1516b7/bin.src/instance_crawler.py#L91 (and onwards) if you want to get the clipping exactly consistent with what he did. So you would take some area, make one "truth catalog" without the clipping, another with clipping, and compare them?

What is the timeline on which it would be feasible to generate the truth catalogs?

danielsf commented 5 years ago

I'm honestly not sure about the timeframe. I haven't had the chance to run the truth catalog code on cosmoDC2, yet.

rmandelb commented 5 years ago

I see. I guess what I'm getting at is whether we'll be able to do this test in the next ~2 days, in which case (if it's horribly bad) we might want to consider even going back to the drawing board on this change to the instance catalogs and restarting Run 2.0i... or whether we need to make do with other tests, and plan on the truth catalog test as something we do after the fact? (on the time scale of weeks rather than a few days)

If it's a time scale of weeks, then I would like to brainstorm other tests that go beyond Andrew's assessments of "fraction of galaxies impacted" (which are a great start, but a little too far from the observables to be definitive). But if it's a time scale of days, then clearly we should rely on this truth catalog test as our best observable-level test.

danielsf commented 5 years ago

I actually don't think the full truth catalog infrastructure is needed. It is over-engineered for this purpose (on account of trying to take care of variables and transients). I can probably put together a crude test in a few hours.

aphearin commented 5 years ago

This plot was requested by @rmandelb to round out the initial coarse checks on how many galaxies are impacted by the CatSim imputations. Here I show the following calculation. Of those galaxies satisfying 0.1 < Rv < 1, what fraction of those galaxies have Av>1.

good_rv_av_cut_impact

rmandelb commented 5 years ago

Thanks @aphearin . This one does seem to preferentially affect some of the higher stellar mass populations, though I suppose we don't know from this whether it's in a component that is actually prominent (e.g., if the higher stellar mass objects have a prominent bulge, and it's the disk that's getting modified, the change in observable properties might be small).

aphearin commented 5 years ago

@rmandelb - I did some more digging based on your comment that we really only need to worry about cases where the component of the object is impacted and that component has a significant contribution to the galaxy's total light. I've repeated the analysis leading to the potentially worrisome plot above, but now taking into account the bulge-to-total ratio, B/T.

In this first plot below, I just rinse-and-repeat the same calculation, but now separately for disk and bulge components. So, to be explicit, what is plotted on the vertical axis is the answer to the following question, "For galaxies at fixed stellar mass and redshift with 0.1 < Rv < 1, what fraction of the disk (bulge) components of such galaxies are impacted by the Av > 1 cut?"

av_cut_impact_disk_bulge_twopanel

The reason the above plot should be encouraging is that high-mass galaxies are bulge-dominated, and the problems at the high-mass end shown in the potentially worrisome plot above appear to be largely limited to the (sub-dominant) disk component.

Motivated by that, the plot below shows another variation on this calculation in which I take this effect into account. The left-hand panel is a reproduction of the potentially worrisome plot (slight differences due only to different binning). In the right-hand panel, I only count the galaxy as a problem if the offending component comprises at least 1/4 of the galaxy's total stellar mass. For example, if a galaxy with B/T =0.99 has an offending disk, that galaxy does not make it into the tally, but if the massive galaxy has B/T = 0.74 with an offending disk, the galaxy gets tagged as a problem galaxy and contributes to the tally.

av_cut_practical_influence_disk_bulge_twopanel

rmandelb commented 5 years ago

Nice! That is extremely helpful. I think it may not be important to repeat this test with the split into harmless vs. everything for the other 3 criteria (i.e., the Av>0 clipping and the Rv>0.1 clipping) since that already showed that most of the affected galaxies fell into the "very low stellar mass and high redshift" category that will not be in an LSST gold-like sample.

I think you already know this, but just to make sure others do not get confused, it's not actually an Av>1 cut, it's a clip of the Av values at 1 (i.e., those galaxies are still present but Av gets set to 1 so as to avoid the saturation issues shown in the first comment).

danielsf commented 5 years ago

I know I still owe y'all a plot of "how does this actually affect galaxy colors." I got distracted yesterday by data wrangling for the ImSim run, and now Cori is down, which is preventing me from doing anything.

evevkovacs commented 5 years ago

@rmandelb You were asking about the change in the Av values. Here is a plot showing the distribution of the change in Av for all galaxies with disk_ or bulge_Av < 0. (There are none of the latter for the healpixels I checked). av_shift The percentages in the legend refer to the percentage of the total number of galaxies affected by the change. For the record, this plot was made for one of the Nside=8 healpixels (=16 x Nside=32 healpixel).

rmandelb commented 5 years ago

@danielsf - understood, thanks.

@evevkovacs - thanks - our impression from just one healpixel was that the Delta Av should indeed be very small for disks (and 0 for bulges) but it's good to see that this carries over more widely. So I think the remaining question is the impact of the Rv change (since it does something non-trivial to colors in principle, so Delta Rv is not a useful quantity on its own).

evevkovacs commented 5 years ago

@rmandelb For completeness, here is the Delta Rv plot. The change in Rv for the affected bulge components is 0.1 or less, because none of the bulges have Rv < 0. For the disk components with Rv< 0.1, since the Rv values span a large range, the distribution has a very long tail, and the Delta Rv is not as useful (see your comment above). rv_shift

evevkovacs commented 5 years ago

@rmandelb Here is the last plot for Delta Av. This shows the change in Av for galaxies whose Rv lies between 0.1 and 1.0. The change in Av for the bulge component is small, but for the disk component it can be much larger. The fraction of affected galaxies is quite small. av_shift2 I also checked the cases where Rv <0.1 and Av >1. There were only a handful of disk components in this category and they all had totally extincted dust-corrected fluxes, so these look like failures of the dust model.

danielsf commented 5 years ago

Here is a comparison between the magnitudes and colors of objects reported by cosmoDC2 directly and those produced by our fit SED+dust parameters. This analysis includes only objects from a single healpixel which ran afoul of Rachel's cuts on Av and Rv above (though I could certainly run this analysis on a random sample of galaxies from that healpixel; analyzing the full cosmoDC2 catalog in this way is not feasible as reading in the SEDs and integrating them over the LSST bandpasses is very time consuming).

Some of these plots are going to look a little scary. We got into this mess because:

insomuch as we did any validation of the SED fitting, we only validated against the unextincted cosmoDC2 magnitudes, since those are what we used to fit the SEDs
Run 1.1p did not include dust extinction in the images, so that validation work had nothing to say about our dust model
the protoDC2 runs in general just used randomly assigned values of Av and Rv to fill in for galaxies with unphysical looking Av and Rv values; we did not look at the effects that the cosmoDC2 dust parameters had on the image magnitudes until now.

This plot shows the distribution of galaxies in g-r, r-i color space in cosmoDC2, our SED fitting scheme without applying Rachel's dust fix, and our SED fitting scheme with applying Rachel's dust fix

color_plot_g_r_i

zooming in on the region actually populated by cosmoDC2 gives

color_plot_zoom_g_r_i

So, you can see that our SED fitting mechanism does introduce some absurd outliers in color-color space, but the majority of sources seem to have a similar color distribution after being fit to SEDs as they do in cosmoDC2.

This plot shows the magnitude in cosmoDC2 versus the magnitude in our SED fitting scheme without applying Rachel's dust fix (this meant mostly to show the effects of fitting SEDs to the cosmoDC2 galaxies in general)

mag_plot_g_r_i

This here is a 1-D histogram of SED magnitudes minus cosmoDC2 magnitudes (again, without Rachel's dust fix)

mag_1d_g_r_i

and again zooming in

mag_1d_zoom_g_r_i

The take away for me is that, while fitting the cosmoDC2 galaxies to SEDs and then taking the cosmoDC2 dust parameters at face value does introduce some extreme cases of mismatch between cosmoDC2 magnitudes and image simulation magnitudes, the vast majority of galaxies show agreement to within a few tenths of a magnitude between the two systems. I don't know if that is something we want to worry about fixing or not.

aphearin commented 5 years ago

@danielsf - could you show your plots after making a cut on stellar mass? Due to the power law slope of the SMF, your plots are dominated by very low-mass galaxies whose colors are not nearly as critical as galaxies with M*>10**10Msun.

danielsf commented 5 years ago

Here are my previous plots including only galaxies with stellar masses greater than 10**10 M_sun

color-color distribution in: -cosmoDC2 -SED with dust fix -SED without dust fix color_plot_g_r_i_mass_cut

zooming in

color_plot_zoom_g_r_i_mass_cut

magnitude comparison of cosmoDC2 and SED without dust fix

mag_plot_g_r_i_mass_cut

1-D distribution of SED magnitude (without dust fix) minus cosmoDC2 magnitude

mag_1d_g_r_i_mass_cut mag_1d_zoom_g_r_i_mass_cut

These actually look worse, in my opinion.

It is possible that we should be fitting Av and Rv so as to best reproduce the cosmoDC2 color distribution, rather than taking Av and Rv as reported by cosmoDC2 at face value.

erykoff commented 5 years ago

Well I for one am very worried about this. In particular for red galaxies in clusters, @aphearin and I (well, tbh, mostly @aphearin) worked very hard to ensure that the red sequence galaxy colors had the correct intrinsic scatter (on the order of 0.03 - 0.05 mag depending on the color/redshift). And if these plots are representative then that would be completely destroyed.
Though I think we want to be looking at the colors as well as the magnitudes to see what's going on there to how much the red sequence as well as photo-zs are affected. But introducing a scatter of a couple tenths of mag would ruin cluster finding. It's also possible, I guess, that the red galaxies live in a Rv/Av space that is not affected by this (astrophysically, they should be basically dust free, but I'm fairly certain that Andrew told me that is not a requirement in the galacticus SED matching step). However, the effects on the general photo-zs would be quite serious and these sorts of shifts will invalidate many of the descqa validation tests.

rmandelb commented 5 years ago

@danielsf - sorry to jump way back to the beginning, but can I ask you to confirm something please? You wrote

This analysis includes only objects from a single healpixel which ran afoul of Rachel's cuts on Av and Rv above

Does this mean that all of your plots are for just a small subset of the galaxies, and don't reflect what is happening to the colors/magnitudes for most objects? I am concerned this may be giving a bit of a skewed picture of the population, because the objects that fall afoul of those cuts may be unusual in some ways. It could be that e.g. Eli's concerns about the red sequence might not be a problem at all, if in fact 90% of red galaxies on the red sequence in clusters are not even on these plots.

I understand the value in focusing first on the objects for which we are modifying the dust parameters to avoid unphysical effects (saturation etc.), but I do believe that when comparing diagnostics of the entire population we cannot judge based on these objects alone.

danielsf commented 5 years ago

@rmandelb Yes, I only focused on the galaxies with offending values of Av, Rv at first. I am putting together a set of post-CatSim magnitudes randomly sampled from the catalog that I am going to pass off to Andrew Hearin for analysis tomorrow morning.

It is possible, though, that we will need to fix the way that we generate InstanceCatalogs. As I tried to imply originally: the InstanceCatalog code fits an SED to the unreddened rest frame colors of the galaxy and then takes the Av and Rv values from cosmoDC2 at face value. It is possible that we should instead be fitting the SED to the unreddened rest frame colors and then finding the combination of Av, Rv that produce the best match to the observed colors reported in cosmoDC2. This will be a more computationally intensive process, but we can probably precache the values in a look-up table that the InstanceCatalog generation code will then refer back to.

rmandelb commented 5 years ago

@danielsf - thanks for putting together the randomly sampled post-CatSim magnitudes.

I agree with you that the process you describe is likely what we should've done in the first place. The question at this point is whether what's already been done produces results that are scientifically acceptable given what the image sims will be used for, or whether that new process must be implemented. I look forward to seeing what @aphearin finds with the randomly sampled post-CatSim magnitudes that you're putting together, which will hopefully shed light on this question.

erykoff commented 5 years ago

At the risk of being difficult (and maybe this is something that was discussed in the past), may I ask why we do the SED re-fit at all, and not use the SEDs as produced in CosmoDC2? They are consistent with the broadband fluxes (since they were used to generate the broadband fluxes), and are therefore validated. On the other hand, maybe we'll get lucky and it's just this crazy population that have such large scatter, but as far as I know this is not a discrete set of galaxies, just the ones at the end of a tail, so there might not be anything "unique" about them. I look forward to the new tests.

evevkovacs commented 5 years ago

@erykoff The SEDs in cosmoDC2 are provided in the form of fluxes in 30 narrow-band top-hat filters that span a wavelength range from 100 nm - 2 microns. These are too coarse for the image simulations. CatSim refits these to a template library to provide the level of detail required.

danielsf commented 5 years ago

@erykoff We are refitting CosmoDC2's narrow band filters to SEDs because that is what PhoSim expects.

When PhoSim simulates a source, it reads in an SED that is parameterized as F_lambda as a function of lambda. It treats the SED as a distribution of photons and randomly draws from that distribution, simulating the propagation of each photon through the system as a function of wavelength. This is necessary to get things like differential chromatic refraction in the emergent way that PhoSim does (i.e. as a result of physical first principles, rather than as a phenomenological model). PhoSim has no API for reading in the narrow band magnitudes that cosmoDC2 reports.

We do this in ImSim because we want ImSim to take the same inputs as PhoSim. We probably could hack ImSim to directly use the magnitudes and colors that cosmoDC2 reports, but that would mean going back to an era in which ImSim and PhoSim have different inputs.

danielsf commented 5 years ago

Because confusion has been arising elsewhere: when I say that we fit the cosmoDC2 colors to an SED, that fit is limited to the ~1000 galaxy SEDs that get shipped in a library with the lsst_sims stack. For each galaxy, we find the SED that is the least squares nearest neighbor to that galaxy in color color color... (for N-1 colors, where N is the number of narrow band magnitudes in cosmoDC2) space.

rmandelb commented 5 years ago

All: while we wait for Andrew's plots, which he is making for a random subsample of galaxies based on a catalog that Scott sent him, I wanted to send some thoughts on what modifications are tolerable when comparing the color distributions against those in the extragalactic catalogs (for which a great deal of tuning of color distributions has been done to ensure they have the behavior we want). This is a synthesis of comments from others plus my own thoughts on what's needed to enable science with DC images, focusing on the imaging-based science cases that were articulated in the DC2 plan. @katrinheitmann has helpfully extracted that list of science cases and put it here. So here are my thoughts:

No PZ working group science cases are listed there. That is because the key PZ science cases for DC2 will use the full extragalactic catalog, rather than the image simulation. However, several of the WL, CL, and LSS science cases listed there rely on photometric redshifts from the image catalogs, so reliance on photometric redshifts and hence colors is implicit in several of those projects (either for garden variety galaxies used for lensing and clustering measurements, or for cluster galaxies that go into a red-sequence cluster finder).
We should remember that there will be a truth catalog that can be used to identify e.g. outlier populations for which the input ImSim and PhoSim photometry is very different from the extragalactic catalogs. So if the impact of the different SED and dust models in the image sims vs. extragalactic catalog is to preserve the colors for most galaxies while producing a small outlier population, this effect could be mitigated by identifying and chucking the outlier population using the truth catalog.
Similarly, there are some remapping techniques that could be used if the color distributions are modified in minor ways, e.g. if the red sequence or blue cloud get slightly shifted or broadened. @sschmidt23 mentioned remapping on the #desc-pz slack channel: https://lsstc.slack.com/archives/C2M8A1UNT/p1537300654000100 (as a way to deal with some potentially unphysically blue galaxies in the extragalactic catalog, but I think this would be trivial to extend to the case where we need to slightly remap the distributions that went into the image sims).
I imagine that another potential mitigation would be to use the CatSim SED and dust model to produce or modify templates that go into the template-fitting photo-z codes or to make a training sample for ML-based photo-z codes, to eliminate any template uncertainty due to the use of those templates. Again, I would think this is mostly a valid approach in the case that the color distributions have been slightly modified, but not greatly modified?

So the zeroth order question is this: are the color-magnitude diagrams going into the image simulations similar to those in the extragalactic catalogs, once we factor in the use of a different SEDs and dust model? The next highest order question is the color-dependent clustering (are environmental effects such as the cluster red sequence preserved?). If they are broadly similar to those in the extragalactic catalogs, such that we can use the truth catalogs to e.g. remap slightly, or to exclude small-ish outlier populations, then the science cases that depend on the images should not be significantly affected.

I would like some feedback from @erykoff @egawiser @sschmidt23 @danielsf and anybody else who is interested on the above statements. I would imagine there are some more clever things we could do by relying on the combination of the extragalactic catalog and the truth catalog in addition to the image simulation outputs, but I haven't been clever enough to think of them yet. :)

danielsf commented 5 years ago

Without seeing Andrew's plots, I would feel a lot more comfortable if we changed the SED + Av, Rv scheme in InstanceCatalog generation to fit for an Av, Rv that better reproduces the extincted magnitudes as reported by cosmoDC2. Now that I understand what is going on, it seems to me that there is no way to readily map Av, Rv in cosmoDC2 to the Av, Rv that the Cardelli, Clayton, and Mathis interpolation scheme (which is ultimately what CatSim, ImSim, and PhoSim use to apply dust reddening and extinction). I don't believe we ever properly validated the SED template + dust colors produced by CatSim, especially given the precision users seem to want out of the image simulations (that should have been my responsibility; I apologize for the oversight).

I have heard that the image simulation effort has been having infrastructure problems so that, changing dust schemes at this point will not sacrifice a lot of pixel data. If I am wrong about that, then I am open to the idea of trying one of the other solutions.

aphearin commented 5 years ago

I've been looking at many variations in different mass/luminosity/redshift ranges, and this is a fairly representative plot for a bread-and-butter sample. Results in other redshift and mass ranges show comparable levels of discrepancy.

gr_ri_pre_post_catsim gr_ri_color_diffs

Alternate masses and redshifts

Here is the same plot for a lower-mass sample in the same redshift range:

gr_ri_pre_post_catsim_lowmass

Third verse, same as the first, except here showing higher mass at higher redshift:

gr_ri_pre_post_catsim_highz_highmass

rmandelb commented 5 years ago

@aphearin - thanks for these illuminating plots!

For others who may be following along, note that these are observer-frame (not rest-frame) colors. So you don't expect to necessarily see a neat red sequence and blue cloud when including a broad redshift range, and you also don't expect an apparent red sequence to appear at the same color for different redshift ranges.

My reaction is that this most likely goes beyond the scenarios I outlined as potentially correctable in this comment earlier today, i.e., it's not just a simple shift or small broadening or introduction of a small outlier population. Also, it seems that the problem is not restricted to just the ~10% of the population for which we are modifying Av/Rv. This is a more global mismatch across the population of galaxies in the sims, possibly indicating that the more complex process Scott outlined here is needed, or even some more basic infrastructure modification.

Out of curiosity, I reached out to @fjaviersanchez and he confirmed that the nice-looking Run 1.2i tests of photometry were focused on a reference catalog test, i.e., they would reflect a match between what CatSim said it was putting into the images and what went into the images, so it's not relevant to the comparison extragalactic catalog vs. image simulations comparison that we're doing here... unfortunately.

Further comments, questions, and reactions are welcome.

If somebody has a more optimistic outlook on these results, I would be curious to hear it.

egawiser commented 5 years ago

I agree with @rmandelb that the plots directly above from @aphearin show significant, undesirable color shifts between the vetted cosmoDC2 inputs and the colors found in CatSim. Assuming that these are truly a random sample of all galaxies and not just the ones whose Av,Rv values are being modified in post-processing, then seeing color shifts for >>10% implies that this problem is (mostly) independent of, and predates, the Av,Rv modification scheme we came up with. Darn!
I did not realize until @danielsf's most recent post that the Av,Rv parametrizations used in cosmoDC2 and the CCM dust law are apparently different - that's a recipe for trouble, and trouble seems to have found us.
My first thought was "why can't we feed CatSim the reddened SED from cosmoDC2 and then tell it to simulate that galaxy with no dust (Av=0)?" This would be imperfect because the library of galaxy SEDs from lsst_sims that are being fit are all unreddened (I assume), but since they should have a wide range of colors and UV-to-optical ratios, it might already work better than the status quo.

LSSTDESC / DC2-production

Run 2.0i: galaxy saturation due to internal galaxy extinction #266

Alternate masses and redshifts