chromatic "RealGalaxy" objects from hydro simulations

GalSim-developers / GalSim

The modular galaxy image simulation toolkit. Documentation:

http://galsim-developers.github.io/GalSim/

Other

227 stars 107 forks source link

chromatic "RealGalaxy" objects from hydro simulations #669

Open rmandelb opened 9 years ago

rmandelb commented 9 years ago

Today I heard a talk by Gregory Snyder ( @gsnyder206 ) in which he showed something super cool: He's been making synthetic galaxy images from the Illustris simulation to simulate data in various passbands. Examples are http://www.stsci.edu/~gsnyder/hudftool.html

They are working to put these into GalaxyZoo to get morphology classifications, too. So this immediately made me think it would be very interesting to have these in GalSim (apparently @msimet had the same idea when she heard this talk too). The basic idea is:

Okay, they aren't real galaxies. But they are realistically complicated, which is very interesting for studies of model bias and so on.
Unlike our current RealGalaxy catalog from COSMOS, they are chromatic.
Even when we have chromatic RealGalaxy catalogs from AEGIS, those will be based on just two bands and thus involve a bunch of assumptions. So in some sense these fully chromatic ones from Illustris may be "better" (though within the limitations imposed by hydro simulations not making perfectly realistic galaxies).

A few questions: 1) How should we think about the resolution limit of the simulations? Should we be treating that like a PSF of sorts, perhaps a Gaussian of some width? If so, then GalSim can easily be made to take that into account when matching to the target PSF.

2) How is the information from the simulations available? In particular, how is the color and redshift information encoded? GalSim knows how to take images of the surface brightness profile, but for each galaxy there are presumably two other dimensions (wavelength, redshift) and I would like to understand how those are encoded in order to figure out how to get GalSim to use that information.

Anybody have any thoughts about this idea? Agree/disagree about it being worthwhile? @rmjarvis , @jmeyers314 , ...? I am excited about this and could put some time into it over the summer, though I won't complain about sharing the load if there are any volunteers who want to be involved.

rmjarvis commented 9 years ago

They definitely look suitably realistic to me to be worthwhile from the standpoint of WL testing. Do they have the right properties as a function of redshift the way the COSMOS catalog does? (We've found that column to be useful in our DES testing.)

gsnyder206 commented 9 years ago

Quick responses below; I also tagged Paul Torrey ( @ptorrey ) who has been a key player on this.

On Thu, Jun 4, 2015 at 1:07 PM, Rachel Mandelbaum notifications@github.com wrote:

Today I heard a talk by Gregory Snyder ( @gsnyder206 https://github.com/gsnyder206 ) in which he showed something super cool: He's been making synthetic galaxy images from the Illustris simulation to simulate data in various passbands. Examples are http://www.stsci.edu/~gsnyder/hudftool.html

A few questions: 1) How should we think about the resolution limit of the simulations? Should we be treating that like a PSF of sorts, perhaps a Gaussian of some width? If so, then GalSim can easily be made to take that into account when matching to the target PSF.

In my view, the safest approach is to further post-process the images by convolving them by a PSF a fair bit larger than the typical resolution element in the simulation. For our raw images, there is no one scale that can be considered the intrinsic smoothing scale, even in a single image, but 0.5-3 kpc is typical.

2) How is the information from the simulations available? In particular, how is the color and redshift information encoded? GalSim knows how to take images of the surface brightness profile, but for each galaxy there are presumably two other dimensions (wavelength, redshift) and I would like to understand how those are encoded in order to figure out how to get GalSim to use that information.

The primary distribution mechanism for our images is individual cutouts in ~36 filters, available here: http://www.illustris-project.org/galaxy_obs/ and also made available in the Illustris Public data release ( http://www.illustris-project.org/data/; Nelson et al. 2015) These are described by Torrey et al. (2015) (@ptorrey has taken the lead in creating and making these available) More details and access info is available here: http://www.illustris-project.org/data/docs/specifications/#sec4

So far, we have published only the z=0 cutout images (observed in the rest frame). When we add higher-z sets, the redshift and filter/wavelength labels will be saved in metadata for the cutouts.

I also have several "lightcone" images as in the link Rachel sent, 3 arcmin wide, and we are actively working to tag these galaxies with redshifts. If these are of interest, I can make them available in the future as needed. These won't contain nearly as many galaxies at a given redshift as the individual cutouts of the full box described above.

Anybody have any thoughts about this idea? Agree/disagree about it being

worthwhile? @rmjarvis https://github.com/rmjarvis , @jmeyers314 https://github.com/jmeyers314 , ...? I am excited about this and could put some time into it over the summer, though I won't complain about sharing the load if there are any volunteers who want to be involved.

— Reply to this email directly or view it on GitHub https://github.com/GalSim-developers/GalSim/issues/669.

ptorrey commented 9 years ago

Glad to see the synthetic images may be of interest to this group. Please feel free to reach out if additional information is needed on accessing/manipulating our synthetic images, or if there are ways we can make our data-set available that would help encourage their use.

Cheers, Paul

rmandelb commented 9 years ago

Some more responses and questions:

In my view, the safest approach is to further post-process the images by convolving them by a PSF a fair bit larger than the typical resolution element in the simulation.

We always convolve with some target PSF. The question is if there is some clearly defined smoothing kernel that we can remove before using these images. I realize you are unlikely to be familiar with how we use image-based light profiles, so here's a quick summary of how we deal with HST images:

We do an FFT to go into Fourier space.
We deconvolve the HST PSF, which corresponds to division in Fourier space. (If we were to go back to real space at this point we would find the image looks like garbage due to this deconvolution step, since we've amplified some noise modes at high k.)
We do whatever operations we want to the galaxy, like shearing.
We then convolve by the target PSF, which must be larger than the original PSF. At that point we can go back to real space and get a reasonable-looking image.

So I was proposing to treat the resolution / smoothing in the original images via a deconvolution, just like we treat the HST PSF when we take input images from HST. Are you saying the smoothing isn't well-defined in terms of a scale, and we should just ignore it and enforce some minimum size on the target PSF?

Thanks for sending the links that describe the simulated data. Personally, I tend to be more interested in something like the "light cone" images, in order to simulate a flux-limited sample that has galaxies at a range of redshifts. (I guess I should go do some calculations to figure out what is the effective flux limit given your 10^10 Msun stellar mass limit, given a reasonable range of SEDs, unless you happen to already know this?) Do you have an estimated timescale for those to be ready? If it's sometime during the summer, I might prefer to wait for those. But if not, then it might be worth my setting up code for the z=0 sample just to have some of that infrastructure in place and ready to go once the light cones are ready.

gsnyder206 commented 9 years ago

Thanks for the details.

So I was proposing to treat the resolution / smoothing in the original images via a deconvolution, just like we treat the HST PSF when we take input images from HST. Are you saying the smoothing isn't well-defined in terms of a scale, and we should just ignore it and enforce some minimum size on the target PSF?

Correct, this sounds to me like the safest approach. In the raw, "perfect" images, there isn't a single well-defined smoothing scale. We typically convolve these with a target PSF before using, but I'm guessing you'll want to choose your own such target.

Thanks for sending the links that describe the simulated data. Personally, I tend to be more interested in something like the "light cone" images, in order to simulate a flux-limited sample that has galaxies at a range of redshifts. (I guess I should go do some calculations to figure out what is the effective flux limit given your 10^10 Msun stellar mass limit, given a reasonable range of SEDs, unless you happen to already know this?) Do you have an estimated timescale for those to be ready? If it's sometime during the summer, I might prefer to wait for those. But if not, then it might be worth my setting up code for the z=0 sample just to have some of that infrastructure in place and ready to go once the light cones are ready.

In the lightcone images, I haven't enforced any stellar mass limit, so the raw images will have no flux limit, in principle, barring limitations imposed by the simulation (this will be at L << L* for most relevant surveys, I believe). That said, the galaxy populations and properties will not be a perfect match to reality. Note that especially the high-z galaxies are very sensitive to dust modeling, which I have oversimplified in my existing lightcone images (in the sense that high-z galaxies are too faint in optical filters, I think).

We might be able to have usable versions (with redshifts tagged) by the end of June.

— Reply to this email directly or view it on GitHub https://github.com/GalSim-developers/GalSim/issues/669#issuecomment-109796660 .

rmandelb commented 9 years ago

In the lightcone images, I haven't enforced any stellar mass limit, so the raw images will have no flux limit, in principle, barring limitations imposed by the simulation (this will be at L << L* for most relevant surveys, I believe). That said, the galaxy populations and properties will not be a perfect match to reality.

I wasn't expecting them to be perfect. :) Just realistically complicated and not TOO weird.

Note that especially the high-z galaxies are very sensitive to dust modeling, which I have oversimplified in my existing lightcone images (in the sense that high-z galaxies are too faint in optical filters, I think).

Thanks for letting us know.

We might be able to have usable versions (with redshifts tagged) by the end of June.

Okay, it's really not a huge rush. I am going on vacation from June 20-July 3 so I will just ping you once I'm back to find out the status, but even if it's not ready then, it's fine.

rmandelb commented 8 years ago

@gsnyder206 and @ptorrey - This fell off my radar for a while for various reasons, but I am still interested in this and now have a student who may want to work on this as well.

Since it has been quite a while since we spoke, I wonder if there is an update on the data products that are now available? In particular, when we last spoke there was a z=0 set of images already public, and you were going to work on making images from a light cone public. Has the latter already happened?

gsnyder206 commented 8 years ago

The lightcone images are not yet public, but there is now a set of z > 0 postage stamps (13 timesteps, 47 filters) available through http://www.illustris-project.org/data/ I'd be happy to help with details of obtaining and using these.

For the existing lightcone images (3 images of ~3x3 arcmin), I can make them available any time. See below for additional details of some examples I hosted on my website.

Let me know if you want to work out how to obtain the rest of the data. Depending on what you want to do, the total amount could be >~ 60 GB for the full set of stacked images, and up to ~1 TB if you want the maximal redshift resolution (i.e., 100 images at small redshift slices).

I am also actively working to make newer (bigger & better) versions of these, with more useful metadata, though this effort may take a few more months.

Cheers, -Greg

Mock lightcone image details: I've posted examples of the lightcone images at: http://www.stsci.edu/~gsnyder/share/Lightcones/Lightcone_FieldA_v1/ "bbdata.fits" contains the full images and other info. They are in common broadband HST and JWST filters. These have no astronomical noise nor PSF. There is additional documentation in the FITS headers, and please let me know if there's a parameter or detail you need to know; I've copied the unit conversion I use most often below. We estimated catalogs of the intrinsic redshifts, masses, etc as a function of position for at least one of the three image sets (not this one, unfortunately) -- let me know if you want me to post these, too -- I'll need to do some minor work to remind myself about this. You can also estimate redshifts of objects using the images and the HDUs called "zweightslice" or "weighted_z" which contain the brightness-weighted object redshift in each pixel. Similarly the stellar masses from "stellar_mass" HDUs. As examples, "bbslice_0004.fits" contains the full data for only z~1-1.5 (there are 10 total such slices available per field); and "broadband_0033.fits" contains the full data for only z~1.05 (there are 100 such slices per field).

The 4096x4096 pixel images are 0.0416178 arcsec/pixel, are fully extragalactic, include all simulated stellar populations at z < 18 (practically speaking, z ~< 12 I think), and contain no observatory effects. I used Bruzual & Charlot (2003) stellar pops, and assumed the baseline Charlot & Fall (2000) dust model. They are in surface brightness units: W/m/m^2/Sr, which I most often convert to Jy using something like: F [in Jy] = 1e26 * ((effective wavelength)^2)/c * (pixel size in steradians) * Original Image [in W/m/m^2/Sr] F [in Jy] = 1e14 * (effective wavelength in microns)^2 * (0.0416/206265)^2 / (3e8) * Original Image [in W/m/m^2/Sr] F [in Jy] = 1.356e-8 * (effective wavelength in microns)^2 * Original Image [in W/m/m^2/Sr]

On Mon, Feb 22, 2016 at 11:28 AM, Rachel Mandelbaum < notifications@github.com> wrote:

@gsnyder206 https://github.com/gsnyder206 and @ptorrey https://github.com/ptorrey - This fell off my radar for a while for various reasons, but I am still interested in this and now have a student who may want to work on this as well.

Since it has been quite a while since we spoke, I wonder if there is an update on the data products that are now available? In particular, when we last spoke there was a z=0 set of images already public, and you were going to work on making images from a light cone public. Has the latter already happened?

— Reply to this email directly or view it on GitHub https://github.com/GalSim-developers/GalSim/issues/669#issuecomment-187253878 .

rmandelb commented 8 years ago

Hi Greg - thanks for the quick reply! I have some questions:

The lightcone images are not yet public, but there is now a set of z > 0 postage stamps (13 timesteps, 47 filters) available through http://www.illustris-project.org/data/ I'd be happy to help with details of obtaining and using these.

Just to make sure I understand, are these the images called "Stellar Mocks: Multi-band Images and SEDs" on http://www.illustris-project.org/data/docs/specifications/#sec4 with the 13 timesteps including z=0, 0.5, 1, 1.5, etc.? For WL simulations, we are mostly interested in z<2 and the ability to construct a flux-limited sample according to the flux in some band. Do I understand correctly that these represent a stellar mass-limited sample at each of the snapshots, but that we could at least impose the flux limit based on the provided SEDs? This would not be as good as having images of galaxies that trace out some realistic dN/dz based on the light-cone, but it would certainly enable some studies.

So let me explain a little more about what would be the most ideal thing from my perspective, and then perhaps we can talk about how close we can get to that:

We'd like to have galaxies with some arbitrary z distribution, down to a reasonable flux limit.
But we'd like to have individual postage stamp images for each of them (not just one big image with all the galaxies).
And we'd like a catalog that describes the object properties like SED, location on the sky where they'd appear in the light cone, etc.

Of course, I can also do some work myself to put things in the form we need. But based on this information, what do you think is the best thing to do?

Also, at what level have the distributions of galaxy size, flux, ellipticity, morphology, etc. been compared with images from HST?

rmandelb commented 8 years ago

**Also: I just downloaded one of the light-cone images - very cool!

gsnyder206 commented 8 years ago

Hi Rachel,

A few responses below.

Just to make sure I understand, are these the images called "Stellar Mocks: Multi-band Images and SEDs" on http://www.illustris-project.org/data/docs/specifications/#sec4 with the 13 timesteps including z=0, 0.5, 1, 1.5, etc.? For WL simulations, we are mostly interested in z<2 and the ability to construct a flux-limited sample according to the flux in some band. Do I understand correctly that these represent a stellar mass-limited sample at each of the snapshots, but that we could at least impose the flux limit based on the provided SEDs? This would not be as good as having images of galaxies that trace out some realistic dN/dz based on the light-cone, but it would certainly enable some studies.

You are correct. The limitation here is that the higher-mass galaxies are better sampled (more well converged) than lower-mass galaxies, and so using a flux limit instead would introduce many low-z galaxies whose shape distribution we trust far less. Though, perhaps at the point of using simulated images for WL tests, these are already not "realistic" enough at any mass, in the sense of being well matched to data (still investigating).

So let me explain a little more about what would be the most ideal thing from my perspective, and then perhaps we can talk about how close we can get to that:

We'd like to have galaxies with some arbitrary z distribution, down to a reasonable flux limit.

But we'd like to have individual postage stamp images for each of them (not just one big image with all the galaxies).

And we'd like a catalog that describes the object properties like SED, location on the sky where they'd appear in the light cone, etc.

This is great to know. As you say, some of this can be reconstructed from our mass-limited images above. I think most of these requirements will be met (depending on the reasonableness of the flux limit and the redshifts of interest) by my current efforts to create new lightcone images. My new strategy is to start with individual postage-stamp images instead of lumping all galaxies together by simulation snapshot. This will satisfy your 2nd and 3rd items, which my current images mostly lack.

Of course, I can also do some work myself to put things in the form we need. But based on this information, what do you think is the best thing to do?

I think it would be interesting to see how close we can get with the new lightcone strategy. In the meantime, it might be good to compare notes about the specific mass and flux scales where you are interested, and where we think the simulations might be most/least useful. If our ranges are different enough we may have to temper our expectations for now.

In general I think the task of "enabling GalSim to interface cleanly with big hydro sim datasets" is going to be a long-term useful thing to do, even if the current such datasets are not "perfect", and I'd be very happy to help out with that effort.

Also, at what level have the distributions of galaxy size, flux, ellipticity, morphology, etc. been compared with images from HST?

Galaxy sizes match poorly (Illustris galaxies are too large), but better at the highest masses. Galaxy morphologies match better (at high mass and low-z anyway, working now on high-z). I'm not sure about ellipticities but I'll be looking into this over the Summer. Fluxes are roughly reasonable (a bit high), but depend strongly on assumptions about dust.

— Reply to this email directly or view it on GitHub https://github.com/GalSim-developers/GalSim/issues/669#issuecomment-188557613 .

rmandelb commented 8 years ago

Thanks for all the information.

For the light cone, it depends what dataset we're trying to simulate. For many current surveys, a magnitude limit of r=25 is fine. What is your gut reaction to this - too deep? More shallow than you thought we'd need? Just right?

And in the meantime, correct me if I'm wrong, but I think that to use the stellar mass-limited snapshots, we'd need to process it to make a catalog ourselves and cut out postage stamps?