get some real galaxy data into the repo

rmandelb commented 12 years ago

For the 2nd milestone, we would like to make a start at using real galaxies for simulations. To do this, we have to decide how those data will be stored. Examples of decisions to be made:

do we store the COSMOS galaxy and PSF? Do we store the pseudo-deconvolved image and its PSF in Fourier space? Do we go for a full deconvolution in Fourier space, in which case no PSF is needed at all?
reducing space: the COSMOS galaxy training set that is publicly available includes a lot of padding around each galaxy, so that when convolved with even a relatively large SDSS PSF, the objects don't go off the edge of the postage stamp. For our purposes, it might be better to cut the input postage stamps to be as small as possible, then pad them after we've read them in.

We will not aim to get the tens of thousands of training galaxies all into the repo by the end of the milestone, but we will try to consider the above issues and do whatever we decide to a subsample of ~100 galaxies.

rmandelb commented 12 years ago

It occurs to me, @barnabytprowe , that we might need to make something analogous to your Gaussian, Moffat, etc. python base classes for SBInterpolatedImages representing COSMOS galaxies?

barnabytprowe commented 12 years ago

Yep, I agree. Currently the closest thing to is it is the OpticalPSF, a class that has an SBInterpolatedImage attribute with some intermediate gubbins for initializing things.

What should we call it?

RealGalaxy? RealGal ? (but then again: http://www.imdb.com/title/tt0805564/... :) As long as it's clear, I guess...

rmandelb commented 12 years ago

Um, gubbins? Are those kind of like tchotchkes, or are they more like thingamajigs or doohickeys? :) In any case... yes, it would presumably be something like OpticalPSF, but with information like the ID of the corresponding galaxy in the training sample and I'll have to think about what else we'll want to carry around, probably some weight factors due to the size-dependent probability of including a galaxy in the training sample. We'll have to make a decision about exactly what kind of information we're storing for the galaxies in order to decide how to make these things. If we're doing all the pseudo-deconvolutions in advance, then it could be initialized based on a pseudo-deconvolved image + other data. (Or, fully deconvolved, which is what Gary's been doing for our comparisons.)

As for the name... oh dear, I shouldn't have clicked that link, haha, how weird. My preference would be RealGalaxy.

I think we've prioritized this issue for next week, so I will try to come up with a plan for what information we keep regarding training galaxies, and the class that we'll define to store them, later this week / early next week.

barnabytprowe commented 12 years ago

RealGalaxy sounds good to me! (I figured I'd be safe with a link to a PG-13, whatever the subject matter :)

rmandelb commented 12 years ago

Yeah, it's safe, just... what a weird premise for a movie. People are strange.

rmandelb commented 12 years ago

Barney and I had a discussion that I want to record here before I forget, and I also want to ping @gbernstein , @rmjarvis , and @TallJimbo for opinions, as there are some long-term strategic decisions to be made about how we handle realistic galaxies:

Let's start with some numbers. The current SHERA training sample has 26k galaxies at I<22.5. Let's say we eventually want a training sample to I=25.5. This would be many hundreds of thousands, but let's say for the sake of the argument that we will randomly subsample to have 100k training galaxies, and the ones we're adding will have similar size images (this is conservative as many fainter ones will be smaller).

The 26k sample, assuming we want to store deconvolved images so we don't have to store a PSF, takes up 15G. However, these include a large amount of padding around the edges (padded with noise) so that when we convolve with an SDSS PSF we won't have funny noise properties due to an abrupt transition from the regular COSMOS noise field to zeros. In practice, if we only want to store the bits of the image containing the galaxies, we could use just the inner ~10% of the image (selected using sextractor segmentation maps), and GalSim can then pad them when doing the simulations. This would take up 1.5G for the existing training sample, or if we assume 100k training galaxies, that would be 6G. While we don't want to put that into the repo itself, it is not at all crazy to put that on a server for people to download. (I have not thought about read-in times, by the way, I'm mostly thinking here about storage space. Obviously there will be significant overhead if there are many galaxies that have to get read in. Also, I assume that the images aren't zipped. Since they aren't padded with zeros, there's not much gain in zipping the individual fits files.)

I assume we want deconvolved images, and not pseudo-deconvolved, for two reasons: first, because if they are pseudo-deconvolved then we need to store the effective PSF, which will also take up space; second, because pseudo-deconvolution as I usually do it is probably fine for simulating data from any ground-based telescope, but not space-based telescopes. Note that in SHERA I only work with the pseudo-deconvolved images in Fourier space. Here I assume we'd be storing them in real space, though I guess that does mean that to convolve them we'll have to FFT them, which is expensive.

For this milestone, I assume we'll pad them as needed with zeros. That is, we'll read in the trimmed deconvolved images, make a Convolve object including the galaxy plus the desired PSF, and then when we draw the image we'll let it tell us how large the output image has to be. This is, I believe, equivalent (in terms of output) to padding the original image with zeros before doing the convolution, but might save time because the FFTs are of smaller arrays.

In the longer term I don't think we want zero-padding because of the weird artifacts that result in the final simulated images. Instead we will want a more complicated procedure that could be part of the 3rd milestone, and we should think about it now because it could mean that our base class representing realistic galaxies should be made to carry around some information about the training galaxies that can be used to pad with a realistic noise field. In principle we can calculate the full noise covariance matrix based on the noise fields around the galaxies. Barney and I propose to compress it to the noise correlation function, and furthermore, to assume isotropy of the noise (we'd calculate 1d correlations around any given point going in the +x, -x, +y, -y direction, and average them), only keeping how ever many numbers are needed until we get to ~1% level correlations. i.e., we'd save an overall noise variance and a few numbers corresponding to the correlation coefficient between adjacent pixels, between pixels separated by 1, by 2, etc. So the data structure containing realistic galaxies would have a variable-length vector to read in for each of our training galaxies. The justification for ignoring anisotropies in the noise in the training data is that (a) for ground-based simulations, the original low-level anisotropies in the noise correlations from space are irrelevant; the relevant noise correlations are the ones that come later from shearing and convolving the noise; and (b) for space-based simulations, we aren't convolving enough to completely wash out the original anisotropic noise correlations, but we'll be randomly rotating the training galaxies, so those anisotropic noise correlations won't correlate with the applied shear or target PSF, thus it's acceptable to ignore their existence. (The uncertainty in what galaxies look like due to that is probably less than uncertainties due to using Tiny Tim PSFs or MultiDrizzle images.)

So... we could bin the training galaxies that get used for our simulations into categories in their noise correlation properties (note that no binning is needed in the overall noise variance, as the random noise fields we generate can always be multiplied by some number to get the right variance). Many of the galaxies will have similar noise correlation properties and so the matrix we need to simulate noise with the right properties will be exactly the same. We could precompute the matrices we need for each category of galaxies, and that might add ~30s at worst of overhead at the start of calculations, which is not significant in the long-term (and probably less than the overhead associated with reading in a gazillion galaxies).

The amount of padding to be added should be chosen based on the size of the target PSF. We did this for the SHERA training galaxies assuming SDSS imaging conditions, but we could apply the same rule given the target PSFs for our sims.

So, I will plan to make a python base class RealGalaxy analogous to OpticalPSF (i.e., carrying around an SBInterpolatedImage plus associated other data: the COSMOS ID, noise properties, etc.). I will put 100 galaxies into the repo, which according to the above calculations, should take up 6M. I propose to make a new directory data/ where they will live. Sound okay?

The script I use for deconvolution and trimming will go into the repo for posterity and reproducibility, though only people who have all the COSMOS training data on their computers will be able to run it.

For photon-shooting, we'll have to investigate whether these deconvolved images can be used at all, or if they include too much ringing.

gbernstein commented 12 years ago

Since you asked, here are some thoughts. First, 1.5G or even 15G is not a problem as far as shipping the data around or storing it. There might be some chance that the speed of reading these from disk is slower than the rest of the operations for making a given postage stamp, but I don't know.

Have a look at my writeup with Daniel Gruen about the need for zero-padding before the DFT if shearing is done by interpolating in Fourier space (which is what we are doing). If disk reading is a bottleneck, then we should store the HST postage stamps and PSF, and do the padding and FFTs and (observed/PSF) division on the fly. If the FFTs are a bottleneck, then it would make sense to store the Fourier versions of the deconvolved galaxies (with all needed zero padding) and make a new version of SBInterpolatedImage which reads straight from Fourier space. I'm going to assume that nearly all of our applications will be convolving with a new PSF and hence need the FT data, not real space.

The exception would be, as you noted, if we wanted to shoot photons through (pseudo-)deconvolved images. Then we need the real-space version, and the current version of SBInterpolatedImage would work.

I am skeptical that 25.5 mag galaxies from COSMOS can be used for our precision shear tests - they will be very noisy.

rmandelb commented 12 years ago

Hi Gary - thanks for reading my looong summary.

Good point on the DFT vs. reading from disk tradeoff. It seems to me that it wouldn't be too hard to test this out now, so I will write a test script to try it out for some typical galaxy postage stamp sizes (assuming a proper amount of zero-padding). I'll try to come to some resolution on this today so I can start coding this stuff up.

As for 25.5 mag galaxies from COSMOS, what I had in mind is that we'd try to simulate a realistic galaxy population (i.e., a range of S/N and size rather than fixed S/N as in GREAT08/GREAT10). This would in principle include objects that are questionable in terms of S/N, and that people would have to decide whether or not they want to attempt to measure. Of course, we might decide that we're simply spending too much computing time on those really faint objects, and eliminate them in the end. But it would be good to have that option.

rmandelb commented 12 years ago

Gary, a few quick questions (or anyone else is welcome to weigh in, but I thought Gary might have an opinion off the top of his head):

Right now it looks like the pad factor when reading in an Image and making an SBInterpolatedImage is simply the smallest number that is > 4 x image size and that can be written as 2^n or 3 x 2^n, which (if I've understood your document with Daniel Gruen) is the right amount of padding for the default quintic interpolant. However, we were finding in our recent tests that we need Lanczos with n>=5 to get the shearing to sufficient accuracy. In table 1 of your write-up, the numbers suggest that to get similar accuracy as for quintic interp with 4x oversampling when we use Lanczos n=5, we would need 6x oversampling.

So: 1) I think that the oversampling factor should ideally be a function of the chosen interpolant and the desired precision (i.e., reading off of your table 1). Do you agree? If so, I will open an issue to implement that after this milestone is done.

2) For now I propose to simply stick with the default oversampling and use Lanczos-5, assuming that we just want some images to play with and don't care if there is a slightly larger error in the shears that we apply. Sound reasonable?

3) our findings had suggested that we might be able to stick with the default quintic interpolation for PSF interpolation, needing higher accuracy only for shearing. How hard do you think it would be to have a mixed interpolator scheme that allows us to use different interpolators for those steps? Perhaps more work than we'd get in saved time due to the use of a quicker interpolation scheme for some of the calculations?

rmandelb commented 12 years ago

More updates: I'm finding that if I want to read in a typical COSMOS galaxy and PSF that have been trimmed to not have padding, then appropriately pad and carry out pseudodeconvolution and shearing, it takes an average of 0.025s per galaxy. This is an average over 1000s of calls, using the python timeit module. If I time only the file read-in, that's 0.02s/gal. Conclusion from this admittedly minimal test is that file i/o is more important timing-wise than FFTs and interpolation, BUT, 0.02s/gal is not horrific... that's 3 days for 10M galaxies on a single processor. In contrast, if we were to store deconvolved images, they would have to be padded already and therefore much larger (factor of ~16 at least) which is bad when we're I/O limited.

My conclusion for now is that we can store real-space galaxy and PSF rather than deconvolved images (either real/Fourier). This gives us maximal flexibility -- if we want to go via DFT we do the Fourier transforms and then work with the deconvolved images in Fourier space to do shearing; if we want to go via photon-shooting then we'd have to get a deconvolved image to shoot.

I would also note that these tests were kind of simple in nature, but probably not so much as to be horrifically misguiding. And it would not be too hard to change our approach to store different kinds of images later on, should our work with the real images and PSFs demonstrate that this is necessary.

Thoughts, @barnabytprowe (or others)? I will start to code up RealGalaxy base class tomorrow.

barnabytprowe commented 12 years ago

On the basis of those results, this seems definitely most sensible for now.

TallJimbo commented 12 years ago

When you were running the timing tests, were you loading images that were saved as individual postage stamps in FITS? If I/O is our bottleneck, I think we could probably make things go a lot faster by putting many images in a single file, so we can read larger chunks from disk at once into memory.

I don't that changes the tentative conclusions of your tests, but it's worth keeping in mind.

rmandelb commented 12 years ago

Jim: yes, that's how I was doing it, good point. Still, I think this issue is kind of like the Image class -- we need a strawman implemented to play with until we have a better idea of what we need. I don't think any of the possible changes we're discussing will be too hard / annoying to do later, so I will proceed as planned.

rmandelb commented 12 years ago

@barnabytprowe : following up on our discussion from before - I just realized something really basic... I had been thinking of the RealGalaxy being parallel to OpticalPSF because it has an SBInterpolatedImage of the galaxy (plus other stuff) - but, I also need it to have an SBInterpolatedImage of the PSF. Is it just me or does that mean it can't be worked in as one of our base classes, all of which have a single SBProfile in them and which have member functions that act on that single SBProfile??

barnabytprowe commented 12 years ago

No, I believe you can simply add another! You'll not be able to call it the RealGalaxy.SBProfile attribute, as that's taken, and all the default methods will only work on whatever's stored in RealGalaxy.SBProfile, but you can add a new attribute and custom methods that access/act on it as you wish (at least this is what I think is the case!)

On 9 May 2012, at 18:58, Rachel Mandelbaum wrote:

@barnabytprowe : following up on our discussion from before - I just realized something really basic... I had been thinking of the RealGalaxy being parallel to OpticalPSF because it has an SBInterpolatedImage of the galaxy (plus other stuff) - but, I also need it to have an SBInterpolatedImage of the PSF. Is it just me or does that mean it can't be worked in as one of our base classes, all of which have a single SBProfile in them and which have member functions that act on that single SBProfile??

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5616472

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

rmandelb commented 12 years ago

Ah, I think I see what you mean. Will try and see how it goes...

On May 9, 2012, at 10:02 PM, Barnaby Rowe wrote:

No, I believe you can simply add another! You'll not be able to call it the RealGalaxy.SBProfile attribute, as that's taken, and all the default methods will only work on whatever's stored in RealGalaxy.SBProfile, but you can add a new attribute and custom methods that access/act on it as you wish (at least this is what I think is the case!)

On 9 May 2012, at 18:58, Rachel Mandelbaum wrote:

@barnabytprowe : following up on our discussion from before - I just realized something really basic... I had been thinking of the RealGalaxy being parallel to OpticalPSF because it has an SBInterpolatedImage of the galaxy (plus other stuff) - but, I also need it to have an SBInterpolatedImage of the PSF. Is it just me or does that mean it can't be worked in as one of our base classes, all of which have a single SBProfile in them and which have member functions that act on that single SBProfile??

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5616472

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5616515

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

rmandelb commented 12 years ago

Okay, I made an attempt at this. Would you be willing to check out branch #104 and see what you think of the following?

1) galsim/real.py will have some new classes and functions for manipulations of real galaxies. Currently, it just has the RealGalaxyCatalog class that we'll use to read in information about a catalog (eventually, will have some SHERA-like functions that act on RealGalaxy objects). Currently, I have the galaxy information living in the catalog and getting read into the RealGalaxyCatalog. Then, for a given RealGalaxy, if we want its info like apparent magnitude, best-fit Sersic fit parameters, etc., we'll know which galaxy index it is in the associated RealGalaxyCatalog, and can pull out whatever we want to know.

2) in galsim/base.py I added the new RealGalaxy base class, which is initialized from a RealGalaxyCatalog and some way of choosing which catalog entry to use (I plan to code up several, but for now, I only implemented straightforward selection by index).

Before moving on, I would appreciate some feedback to make sure I'm not barking up the wrong tree with this design. I haven't made catalogs, therefore I haven't tested the code and I'm sure it has bugs, etc.

gbernstein commented 12 years ago

On May 9, 2012, at 5:31 AM, Rachel Mandelbaum wrote:

Gary, a few quick questions (or anyone else is welcome to weigh in, but I thought Gary might have an opinion off the top of his head):

Right now it looks like the pad factor when reading in an Image and making an SBInterpolatedImage is simply the smallest number that is > 4 x image size and that can be written as 2^n or 3 x 2^n, which (if I've understood your document with Daniel Gruen) is the right amount of padding for the default quintic interpolant. However, we were finding in our recent tests that we need Lanczos with n>=5 to get the shearing to sufficient accuracy. In table 1 of your write-up, the numbers suggest that to get similar accuracy as for quintic interp with 4x oversampling when we use Lanczos n=5, we would need 6x oversampling.

The need for Lanczos5 was as the x-space interpolant, not the k-space interpolant. As far as I can tell, the quintic is a better choice in all respects than Lanczos for our k-space interpolant, so I would stick with that, and 4x padding.

And the Lanczos n>5 in x space was needed only for the purpose of getting close agreement with SHERA, which effectively defines the image to be sinc-interpolated between the samples. A lower-order xInterpolant would make an acceptable, internally consistent standard for Great3. Although in practice it won't make a lot of speed difference, since it should be rare to actually interpolate in x space as we'll always be convolving with some PSF for your application.

So: 1) I think that the oversampling factor should ideally be a function of the chosen interpolant and the desired precision (i.e., reading off of your table 1). Do you agree? If so, I will open an issue to implement that after this milestone is done.

I would say you are expanding your scope significantly if you try to have the code automatically figure out these things instead of just using a default padding and interpolant that we believe will get an adequate answer. The "precision" that you would like to specify would be the accuracy of sheared moments, and this will depend not only on the padding / kInterpolant but also on the light profile of the object itself.

2) For now I propose to simply stick with the default oversampling and use Lanczos-5, assuming that we just want some images to play with and don't care if there is a slightly larger error in the shears that we apply. Sound reasonable?

See above - you can set the default xInterpolant to be Lanczos5. But leave the default kInterpolant at Quintic.

3) our findings had suggested that we might be able to stick with the default quintic interpolation for PSF interpolation, needing higher accuracy only for shearing. How hard do you think it would be to have a mixed interpolator scheme that allows us to use different interpolators for those steps? Perhaps more work than we'd get in saved time due to the use of a quicker interpolation scheme for some of the calculations?

I'm confused about the first statement of the question (not clear about x vs k interpolation). Are you really concerned about making Great3 faster for unsheared images? I would not choose to prioritize solving a speed problem that might exist in a use case (drawing lots of unsheared images) that is not very high on our list of use cases for Great3. Better to spend your time getting the thing working.

rmandelb commented 12 years ago

On May 9, 2012, at 11:57 PM, Gary Bernstein wrote:

On May 9, 2012, at 5:31 AM, Rachel Mandelbaum wrote:

Gary, a few quick questions (or anyone else is welcome to weigh in, but I thought Gary might have an opinion off the top of his head):

Right now it looks like the pad factor when reading in an Image and making an SBInterpolatedImage is simply the smallest number that is > 4 x image size and that can be written as 2^n or 3 x 2^n, which (if I've understood your document with Daniel Gruen) is the right amount of padding for the default quintic interpolant. However, we were finding in our recent tests that we need Lanczos with n>=5 to get the shearing to sufficient accuracy. In table 1 of your write-up, the numbers suggest that to get similar accuracy as for quintic interp with 4x oversampling when we use Lanczos n=5, we would need 6x oversampling.

The need for Lanczos5 was as the x-space interpolant, not the k-space interpolant. As far as I can tell, the quintic is a better choice in all respects than Lanczos for our k-space interpolant, so I would stick with that, and 4x padding.

And the Lanczos n>5 in x space was needed only for the purpose of getting close agreement with SHERA, which effectively defines the image to be sinc-interpolated between the samples. A lower-order xInterpolant would make an acceptable, internally consistent standard for Great3. Although in practice it won't make a lot of speed difference, since it should be rare to actually interpolate in x space as we'll always be convolving with some PSF for your application.

My concern about needing Lanczos-5 was based not on the comparison with SHERA, but rather on our tests using Gaussians where we know the answer exactly. The numbers that I have from our e-mail exchange was that getting the sheared, PSF-matched Gaussian galaxy moments right require Lanczos-5 (-0.06% error... I'm assuming we want better than 0.1%, which I thought was not achievable with quintic).

But, I realize now that I may have misunderstood - I thought this was the k-space interpolant that you were changing to get the shearing right, but you're saying it was the x-space one? In that case, we can just initialize the SBInterpolatedImage w/ a Lanczos-5, but not worry about SBProfiles defaults for k-space.

Sorry for the confusion.

Rachel

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

barnabytprowe commented 12 years ago

Hi Rachel... A quick comment about the code:

I notice from the line

# read in the galaxy, PSF images
hdu_list = pyfits.open(os.path.join(real_galaxy_catalog.imagedir, real_galaxy_catalog.gal_filename[use_index]))

that currently the RealGalaxy class will be doing some sort of file access (even if just the open command) everytime an object is instantiated... Is this what we want? I'm thinking it must be possible to have the RealGalaxyCatalog object read in a bunch of objects from a single file, or from many files in one go, and then remember what it's read in (and even remove things if it runs out of memory, I guess) so they can be passed directly to the RealGalaxy init method.

I'm just wondering how much of an overhead a file I/O for every intstantiation of a RealGalaxy is going to be... Probably not huge, but if we want to run many jobs in parallel it could put a drag on the filesystem.

rmandelb commented 12 years ago

@rmjarvis , do you want to put calculations with real galaxies into the demo? If so, you could check out #104 now, where there is what seems to be a working version of the readin of a galaxy catalog and creation of a RealGalaxy object corresponding to an entry in the catalog. I was also going to make a helper function that takes a RealGalaxy, a target PSF and pixel scale, an optional shear value, and does the deconvolution / shearing / PSF-matching / resampling for you, but I won't get to that for another ~2 hours, so I thought you might want to at least check out the way this stuff basically works.

There's an example catalog (100 gals) in examples/data/. I didn't put the images in the repo because they are 7M total (galaxies + PSFs for all 100), but you can take them from http://www.astro.princeton.edu/~rmandelb/real_galaxy_images.fits http://www.astro.princeton.edu/~rmandelb/real_galaxy_PSF_images.fits

So if you are sitting in examples/data/ and have the catalog and image files there, you could do the following in python:

rgc = galsim.RealGalaxyCatalog('real_galaxy_catalog_example.fits','.')

to get a catalog (first arg = path to catalog, second = director where the image files live). Then do

rg = galsim.RealGalaxy(rgc, index = 0)

to make a RealGalaxy corresponding to the first object in that catalog. Its SBProfile attribute will be the galaxy SBInterpolatedImage, rg.PSF is the PSF SBInterpolatedImage, and there is various other bits of info in there.

Comments, questions, etc. are welcome.

rmandelb commented 12 years ago

Barney - you are quite right. This is not a design that will work when simulating 10M objects (I guess the max number for RealGalaxy actually is the number of training data objects, not total number of simulated objects, but we might end up wtih 100k training galaxies...). I just thought we might play with 100 objects for a few weeks before deciding how we will most likely want to use RealGalaxy in the long term, and then we can decide how to optimize issues related to i/o, etc. This is just a first stab at something simple so we have something to play around with and figure out what we need.

rmandelb commented 12 years ago

Just to add one thing about the above, Mike - I still need to do some more thorough tests... I have only done basic things like, if I made rg using the above code, then I checked that the results of:

galimg = rg.draw(dx = rg.pixel_scale)
psfimg = rg.PSF.draw(dx = rg.pixel_scale)

when written to file looked like the objects that are in the actual galaxy and PSF image files. Apologies in advance if there are more subtle lingering bugs.

Do you have any suggestions for unit tests for this stuff? I can't think of ones that are based on the real galaxies... though if I write a script that does the simulation of a lower-resolution sheared object, I could write a unit test that does that calculation based on Gaussians, and compare against the known ideal result.

barnabytprowe commented 12 years ago

Rachel: sounds good, I did just want to flag it up though (I knew from our conversations that we'd both been considering this issue).

rmandelb commented 12 years ago

Yeah, in the end I decided that it wouldn't be a huge time investment to do this version, so I won't be horribly sad if we scrap much/all of it and make a more informed choice later.

On May 10, 2012, at 1:02 PM, Barnaby Rowe wrote:

Rachel: sounds good, I did just want to flag it up though (I knew from our conversations that we'd both been considering this issue).

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5630643

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

rmjarvis commented 12 years ago

do you want to put calculations with real galaxies into the demo?

Yes. I was going to have Script3 take your RealGalaxy catalog, convolve them with Komolgorov PSFs and write the outputs back to a bunch of separate files. (To include the 3rd style of multi-object output, although the most trivial one.)

rmandelb commented 12 years ago

Well, no Kolmogorov now, but I'm glad you want to include the real galaxies. Please let me know if there's something you'd like me to change about how I've done this, since I'll have some more time to work on this during the afternoon.

On May 10, 2012, at 1:10 PM, Mike Jarvis wrote:

do you want to put calculations with real galaxies into the demo?

Yes. I was going to have Script3 take your RealGalaxy catalog, convolve them with Komolgorov PSFs and write the outputs back to a bunch of separate files. (To include the 3rd style of multi-object output, although the most trivial one.)

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5630854

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

barnabytprowe commented 12 years ago

Mike,

You've such a lot already, would you like me to take the lead on this?

On 10 May 2012, at 10:10, Mike Jarvis wrote:

do you want to put calculations with real galaxies into the demo?

Yes. I was going to have Script3 take your RealGalaxy catalog, convolve them with Komolgorov PSFs and write the outputs back to a bunch of separate files. (To include the 3rd style of multi-object output, although the most trivial one.)

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5630854

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

rmjarvis commented 12 years ago

OK, if not Komolgorov, then is there anything else new that I should do in the third script? I guess I could instead use the (real?) psfs in your catalog?

Actually, that might be a better unit test. The RealGalaxies are the result of original images deconvolved by the appropriate PSF, right? You could test the rg * psf = something consistent with the original.

rmandelb commented 12 years ago

Hi Mike - The galaxies are the observed images, and the PSFs are the real HST PSFs. So that is what we can use for the deconvolution before shearing and applying whatever real PSF we want.

I think we should just choose a target PSF from one of our existing PSF models. BUT - I really hope that by the end of the afternoon, real.py will include a function to do most of that automatically, given a RealGalaxy and one of our base classes to serve as a PSF plus optional shears and output pixel scale. So, I don't think you have to code all that up, just the framework for making the bunches of postage stamps, the choice of target PSF and pixel scale and shear.

On May 10, 2012, at 1:23 PM, Mike Jarvis wrote:

OK, if not Komolgorov, then is there anything else new that I should do in the third script? I guess I could instead use the (real?) psfs in your catalog?

Actually, that might be a better unit test. The RealGalaxies are the result of original images deconvolved by the appropriate PSF, right? You could test the rg * psf = something consistent with the original.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5631174

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

rmandelb commented 12 years ago

I have good news and bad news-

Good news: I successfully defined the RealGalaxyCatalog and RealGalaxy in a functional way. I also made a method simReal that works on RealGalaxy objects and simulates data with a target PSF, shear, pixel_scale (properly deconvolving etc.). I tested it using a script I wrote, examples/DemoReal.py -- to run this, you'll need the example catalog I put in examples/data/, and you'll have to download the image catalogs from my webpage in a message a few up from this one. This script simulates Moffat PSFs for a ground-based telescope with 0.2" seeing, for good and bad seeing. It selects a random galaxy from the catalog, and simulates data with those PSFs, with and without a shear of g1=0.05. If you look at all the output images (which are the original HST galaxy and PSF at their resolution, and at the target resolution you get the 2 target PSFs and the 4 galaxy images with and without shear, with good and bad seeing). You can see that things are roughly sensible: e.g., you can tell the difference between sheared or not; you can see that in bad seeing the galaxy looks more like a PSF than in good seeing.

Bad news: I have no more time for (1) extensive testing of this, e.g. making sure that simReal behaves properly in more general cases or catches weird inputs,(2) making unit tests, or (3) writing a multi-object demo script that uses this.

If someone else wants to take this on, then I think that (3) would be pretty simple - you could take script 1 or 2 from MultiObjectDemo to set up grids and PSFs, etc., and then stick in some of the commands from my DemoReal.py into a loop. I would be happy to do (1) and (2) at some later point, if someone is okay with merging this in for demonstration purposes and opening an issue for them later.

Thoughts? Sorry to dump this in such an awkward place. I'll still be available for brief comments and minor bug fixes, but I have some other things demanding time that preclude an extended session of unit-test-writing and really thorough tests of the various new features.

rmjarvis commented 12 years ago

Perhaps since Komolgorov was also delayed, we should just delay Script 3 one more week. We've been very productive the last couple weeks, so the point of the milestone spurring activity was highly successful. But maybe tomorrow, we can just finish the decision about #101, and add a couple more features to Script 2 (like shearing the galaxy components). Then if you think you and Joerg can finish up by next Friday, we can have that be milestone 2b, which could be the original plan for script 3.

rmandelb commented 12 years ago

I expect I could finish up those bits of #104 next week. Joerg had said he wants to work on Kolmogorov next week, though I don't know if he meant to finish it then. In any case, I am fine with this plan. Barney?

On May 10, 2012, at 11:12 PM, Mike Jarvis wrote:

Perhaps since Komolgorov was also delayed, we should just delay Script 3 one more week. We've been very productive the last couple weeks, so the point of the milestone spurring activity was highly successful. But maybe tomorrow, we can just finish the decision about #101, and add a couple more features to Script 2 (like shearing the galaxy components). Then if you think you and Joerg can finish up by next Friday, we can have that be milestone 2b, which could be the original plan for script 3.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5642933

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

barnabytprowe commented 12 years ago

Yes, I think that sounds very reasonable, and sorry for the delay in replying! I hope you weren't all on tenterhooks.

We've done a huge amount, and we all have enough externally-imposed deadlines we shouldn't thrash ourselves to meet a self-imposed one at the expense of all else. We couldn't have got much more done anyway (or, at least, I'm not sure I could have!)

On 10 May 2012, at 20:29, Rachel Mandelbaum wrote:

I expect I could finish up those bits of #104 next week. Joerg had said he wants to work on Kolmogorov next week, though I don't know if he meant to finish it then. In any case, I am fine with this plan. Barney?

On May 10, 2012, at 11:12 PM, Mike Jarvis wrote:

Perhaps since Komolgorov was also delayed, we should just delay Script 3 one more week. We've been very productive the last couple weeks, so the point of the milestone spurring activity was highly successful. But maybe tomorrow, we can just finish the decision about #101, and add a couple more features to Script 2 (like shearing the galaxy components). Then if you think you and Joerg can finish up by next Friday, we can have that be milestone 2b, which could be the original plan for script 3.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5642933

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5643060

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

barnabytprowe commented 12 years ago

(And I'd be very happy to help with any of that next week.)

Mike: I think finishing up #101/#103 (we might likely get some opinions from Munich) then implementing the shearing and shifting of galaxies tomorrow sounds like plenty. I'll be up early so that half your morning is not wasted if you want to get going and need my input / labour. Let's be in touch!

rmandelb commented 12 years ago

Okay, good. I'm happy to finish that stuff at a more leisurely pace next week.

And, actually, I thought of some conceptual concerns about the current implementation (already) - am thinking of making a pull request just to get a discussion going (not that I'll actually be merging it in before discussion and the work I described above).

rmjarvis commented 12 years ago

I don't know if a pull request makes sense. Probably, you just want to comment directly on the code (e.g. here) bringing up whatever your concerns are.

rmandelb commented 12 years ago

Actually, for the past few days I mostly haven't been able to look at code on github. When I click on that link (or, e.g., in the links you and Barney made to #101 and #103), I get a github page with a spinning github icon like it's thinking about showing me the code, and that just goes on indefinitely. This seems to happen 90% of the time, so I've been checking out branches to peek at the code, instead, and I can't comment on the code when that happens. Does this ever happen to you??

But anyway, it's more of a basic design question than an issue with specifics of the code. I'll just bring it up in a separate comment on #104 (initially I thought of a pull request just because #104 has gotten long and I don't know how many people are still paying attention :).

Rachel

On May 11, 2012, at 8:08 AM, Mike Jarvis wrote:

I don't know if a pull request makes sense. Probably, you just want to comment directly on the code (e.g. here) bringing up whatever your concerns are.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5649104

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

rmjarvis commented 12 years ago

Yes, it's happened to me too. I'd estimate around 30% of the time this week. Usually if I go back to the root of the code tree and navigate in again, it works. And occasionally clicking a directory link immediately fails. (I forget what error it says.) But in those cases, clicking it again usually works. I guess some bugs in the github code. I don't remember either behavior happening before this week, so maybe some update they made is buggy. Which probably (hopefully) means that they'll notice and fix it soon.

rmandelb commented 12 years ago

Okay, here's the basic question I thought of this morning: Our base classes have a bunch of methods that make sense for Gaussian, Sersic, etc. and for objects like Convolve, Add. But some of those methods do NOT make sense for a RealGalaxy, so I'm wondering now how to handle that. For example:

Add (+, +=), Convolve --> I can't see it making sense to add or convolve two real HST PSF-convolved galaxies, and they have other information like PSFs that would need to be properly handled anyway
applyDistortion, applyShear, createDistortion, createSheared--> these are PSF-convolved objects so why would we be doing that?
applyRotation, createRotated --> we do want to be able to do this, but the PSF has to be properly handled to, as in the SimReal function. applyRotation as is will only rotate the galaxy image.

Methods that do make sense as-is: getFlux, setFlux, , =, copy (could be used a lot, e.g., if we want to make a load of different simulations from one real galaxy), applyShift/createShifted, draw.

Given the concerns about the functions above, does this suggest that RealGalaxy needs special treatment as a base class? Another option I thought of was to not make it a GSObject, just a class to which we would explicitly add the methods that do make sense. Or is there a way to remove these methods from it but keep it as a base class?

Thoughts?

rmandelb commented 12 years ago

And on another note, since I'll be doing unit tests etc. next week, it would also be a good time for me to do any tweaking that you think might be necessary for the work on this branch to be useful in demo script 3. Barney and Mike, since you've done the demo scripting so far, would you be able to take a look at what's on here and let me know of any changes that you think are important in the short term, before this gets merged into master?

rmjarvis commented 12 years ago

It sounds to me that the thing that should be a GSObject is the thing that we would convolve with a PSF to get a final galaxy that we would draw on an image. So the syntax we would do to get this should be more or less:

gal = galsim.RealGalaxy(real_gal_catalog, index=index)
psf = galsim.Moffat(beta=3.5, fwhm=3)
final = galsim.Convolve(gal,psf)
im = final.draw()

This process should automatically do the deconvolution by the HST PSF along the way.

In fact, if you do:


gal = galsim.RealGalaxy(real_gal_catalog, index=index)
im = gal.draw()

this should give you an image of the deconvolved galaxy.

rmjarvis commented 12 years ago

(Sorry, I didn't finish my thought above.)

So the above RealGalaxy might not be the same thing as the RealGalaxy you currently have in the repo. You might want that to be called OriginalRealGalaxy or something like that, which would be a member of the actual RealGalaxy. But it sounds like OriginalRealGalaxy probably shouldn't be a GSObject.

rmandelb commented 12 years ago

So essentially, what you're suggesting is that the steps in SimReal that include deconvolution and optional rotation should be moved into the constructor for the RealGalaxy?

I'm not sure I'd need to make an OriginalRealGalaxy (not base class) and a RealGalaxy (base class)... If I do the deconvolution in the RealGalaxy constructor (which probably does make sense now that you mention it, since essentially all required calculations require that to be done) then the member SBProfile could be the deconvolved one, and the original image and its PSF could be included - so, the base class methods that act on the SBProfile will properly work on the deconvolved one, but if someone has some special need to get the original image and PSF, they could use real_galaxy.original_image and real_galaxy.original_PSF or something of that sort. I guess the disadvantage of doing it this way is that if you make a ton of copies of a RealGalaxy because you want to apply lots of different rotations/shifts/shears/whatever, you're also getting tons of redundant copies of the original galaxy. So, perhaps it does make sense to have an OriginalRealGalaxy after all -- it someone wants one, they could create one from the RealGalaxy, i.e.

my_original_galaxy = galsim.OriginalRealGalaxy(real_galaxy)

... which would take advantage of the fact that real_galaxy knows what catalog it came from and what is the right index in that catalog, but has the disadvantage of requiring another file read-in. Perhaps that's okay because we won't often need to look at the originals, except for testing purposes, so it doesn't matter that it's adding some expensive file i/o? (And in any case, file i/o issues with the current scheme are an issue to be explored in future, I think, once we see how we are using the RealGalaxy objects in practice. For now I'm inclined to go with this latter scheme.)

rmjarvis commented 12 years ago

You don't necessarily have to create the actual deconvolved image in the constructor. I probably wouldn't in fact.

I would store the original_gal and the psf, and have the SBProfile member be an SBConvolve of (original_gal and SBDeconvolve(psf)). Then the actual image would never have to be made. SBProfile would handle all the determination of how big that image should be and everything when you eventually use it as part of a draw routine.

rmandelb commented 12 years ago

I was writing imprecisely; I didn't actually mean to make a deconvolved "image", just the appropriate SBWhatever object, i.e. following the steps in SimReal. I don't make a deconvolved image there, either - I make a Convolve, which in this case would be an SBConvolve but otherwise it's conceptually the same process. Just moving around some lines of code with minor changes...

On May 11, 2012, at 10:20 AM, Mike Jarvis wrote:

You don't necessarily have to create the actual deconvolved image in the constructor. I probably wouldn't in fact.

I would store the original_gal and the psf, and have the SBProfile member be an SBConvolve of (original_gal and SBDeconvolve(psf)). Then the actual image would never have to be made. SBProfile would handle all the determination of how big that image should be and everything when you eventually use it as part of a draw routine.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/104#issuecomment-5651661

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

barnabytprowe commented 12 years ago

getting tons of redundant copies of the original galaxy.

I think I'm happy to live with lots references to the original galaxy being potentially open at once, which I think is the picture I'm getting of how this would work from you guys' discussion.

As for having an OriginalRealGalaxy()class, I'm undecided. I think I like the idea that if someone has some special need to get the original image and PSF, they could use real_galaxy.original_image and real_galaxy.original_PSF to access them. As the labour intensive part of a RealGalaxy generation (i.e. the deconvolution) is currently planned to only happen at the draw stage, you could access these attributes without having to do the full RealGalaxy generation if they were all you wanted.

As for having file I/O to get them... I think you could eventually simply have each attribute be a reference to its required image in a storage class such as the RealGalaxyCatalog. This image would be file loaded first time it's needed, and once only, into the RealGalaxyCatalog or another similar storage object. But for now, I/O to the file would be fine.

rmandelb commented 12 years ago

Well that wasn't quite what I meant by "getting tons of redundant copies of the original galaxy." What I meant was, let's say we decide to make a ring test, with lots of rotated versions of the galaxy all with the same shear. So, if we want N versions, the code would be roughly like this:

real_galaxy = galsim.RealGalaxy(catalog, index = 0)
for i in range(N):
    rotation_angle = galsim.Angle(2*i*np.pi/N, galsim.radians)
    rotated_galaxy = galsim.createRotated(real_galaxy, rotation_angle)
    rotated_galaxy.applyShear(g1, g2)
    ...some steps to make images and measure them or write them out...

and what I meant was that in each of the many rotated_galaxy objects, the member SBProfile would be different because it's been rotated, but the original_galaxy and original_PSF would be exactly the same. This seems kind of silly, and could even become a memory issue if we are making a grid of many many real galaxies and have to store information about 3 SBProfiles for each, instead of 1.

Perhaps this isn't worth worrying about now, I don't know. But that was what came into my mind -- there must be a better way of associating all those modified copies with a single instance of the original image, and that's why I concluded the last message the way I did.

rmandelb commented 12 years ago

(better way --> more elegant, I mean; obviously, keeping them in RealGalaxy works, it's just unwieldy IMO)

TallJimbo commented 12 years ago

This would be a good use case for making at least SBInterpolatedImage immutable and use shared_ptr in C++; then all of the above would work, but under the hood there wouldn't be any copies. We might also be able to get the same effect without making the class immutable, and just make its innards shared. In any case, I think a C++-level solution in that class is the way to address this problem.

GalSim-developers / GalSim

get some real galaxy data into the repo #104