Allow more than one input catalog in config

rmjarvis commented 11 years ago

While writing the first YAML config file for the great3 scripts, I realized that there were some items that needed catalog entries, but where the entry values referred to each output file, not each object within an output file. So I think it would make sense to have two sets of catalogs.

There would be one catalog that would record values for each output file (e.g. noise variance, random seed for the noise, the shear for constant shear branches, PSF properties for constant PSF branches, etc.).

Then there would be another catalog for each output file that would record the values that change for each object (galaxy properties, shift, position, etc.).

My thought for how to implement this is to allow the input: catalog item in config be a list. If it is a list, then it would load all the catalogs given, and the user would have to specify which catalog they meant when specifying an InputCatalog item. I'm thinking with num : 0, num : 1, etc. The default would be num : 0, so if you only have a single input catalog, you don't need to specify it. (Hence, the change wouldn't break any existing config files.)

Comments welcome.

rmjarvis commented 11 years ago

Another thought about this:

For consistency, I think we could allow all the input items to be lists and use the same structure for specifying which item in the list you are referring to with a particular item. I don't have any particular use case in mind for the other input items, but it would actually make things somewhat easier to implement internally if I let them all work the same way.

That's also why I suggested using num to specify which catalog in the list, rather than cat_num or something like that. num can be used for all the things that are specified in the input field. so it cat be a bit more consistent.

barnabytprowe commented 11 years ago

My thought for how to implement this is to allow the input: catalog item in config be a list. If it is a list, then it would load all the catalogs given, and the user would have to specify which catalog they meant when specifying an InputCatalog item. I'm thinking with num : 0, num : 1, etc. The default would be num : 0, so if you only have a single input catalog, you don't need to specify it. (Hence, the change wouldn't break any existing config files.)

Sounds powerful and simple. And excellent that it's an extension that can be built on the existing code without breaking it.

For consistency, I think we could allow all the input items to be lists and use the same structure for specifying which item in the list you are referring to with a particular item. I don't have any particular use case in mind for the other input items, but it would actually make things somewhat easier to implement internally if I let them all work the same way.

Seems fine to me, especially if it makes the implementation simpler, and it can be made so the user need not worry if they don't need to use the feature (which it seems it can). One use case I can imagine - and actually verifying whether this would be possible would be a good check on my understanding of the proposal Mike - might be to select a galaxy model at random from a list of possibilities by having num be a randomly-selected integer. Could this be done, e.g. would it be possible to make:

num = {"type": "Random", "min"=: 0, "max"=nmodels - 1,}

? If so, great. For variety in sims it could be really handy to select models (or other input items) at random from a list of possibilities, and this would be an immediate use of the functionality.

rmjarvis commented 11 years ago

num = {"type": "Random", "min"=: 0, "max"=nmodels - 1,}

Yes. Definitely.

And as usual with random integers, if config knows the range in question (which it would here) you wouldn't need to specify min and max if you want 0..nmax-1. (e.g. index for InputCatalog or RealGalaxy)

Of course, you can already do this if you have your galaxy models in a List type by setting the index : { type : Random }, but this new feature would let you select randomly among different real galaxy catalogs for instance.

rmandelb commented 11 years ago

I think this is a sensible proposal, and very good that it doesn't break existing yaml files!

rmandelb commented 11 years ago

@rmjarvis , any objection to my merging #325 into here even though the PR is still open? Since I'm using this branch for GREAT3 tests, it would be helpful to have n>4.2 on here.

rmjarvis commented 11 years ago

any objection to my merging #325 into here?

No. Of course not. I regularly merge in branches that are expected to be merged to master before the one I'm working on. I hadn't don't #325 here yet, though. So be my guest.

rmandelb commented 11 years ago

Done, thanks.

rmjarvis commented 11 years ago

@barnabytprowe Barney, I'm working on getting the whitening stuff into config, and I have a procedural question for you. I'll also lay out the process I'm implementing, so please let me know if you think this all makes sense. (The question is near the end.)

The basic idea that I'm working with is that a RealGalaxy now stores a noise attribute that starts out with the original noise profile given by the RealGalaxyCatalog (a new feature Rachel has added). Then as we do things to the galaxy, I check if there is such an attribute and do the same thing to it to keep it up to date. Then after the image is drawn, we can applyWhiteningTo the postage stamp to uncorrelate the noise. Finally, when we add the noise we want, we have to subtract off the variance that was left after the whitening process.

It occurred to me that we have to pad with the correlated noise so that the drawn image will have that correlated noise over the whole image. I guess this requires that the padded image is larger than the postage stamp to make sure there is not another edge where the noise drops to zero. I think this is ok, although the details are a little bit complicated to make sure this necessarily happens.

This procedure results in a slight hiccup at the edge of the original image, since the padded correlated noise cannot correlate across the original edge. I think this is just a required artifact of the process, but if you have any ideas about this, please let me know.

Now, the procedural question. I noticed you have applyExpansion in BaseCorrelatedNoise rather than either applyDilation or applyMagnification. I remember that this had something to do with the flux scaling and the fact that that works differently for the noise than for a profile. But what should I apply to the noise when the galaxy is either dilated or magnified?

I think I know the rest. applyShear and applyRotation are just the same. applyShift does nothing. And scaleFlux(factor) requires scaleVariance(factor**2), right?

reikonakajima commented 11 years ago

I ran into some errors on two of my machines. On "imperial", I get a bunch of warnings related to "dereferencing pointers" (I show here only one of them):

g++ -o pysrc/.obj/PhotonArray.os -c -O2 -ftree-vectorize -Wall -Werror -fPIC -Iinclude/galsim -Iinclude -I/software/epd/include -I/vol/software/software/tools/tmv/tmv0.71/x86_64/include -I/software/epd/include/python2.7 -I/software/epd/lib/python2.7/site-packages/numpy/core/include pysrc/PhotonArray.cpp
cc1plus: warnings being treated as errors
In file included from /software/epd/include/boost/preprocessor/iteration/detail/iter/forward1.hpp:62,
                 from /software/epd/include/boost/python/detail/caller.hpp:134,
                 from /software/epd/include/boost/python/object/function_handle.hpp:8,
                 from /software/epd/include/boost/python/converter/arg_to_python.hpp:19,
                 from /software/epd/include/boost/python/call.hpp:15,
                 from /software/epd/include/boost/python/object_core.hpp:14,
                 from /software/epd/include/boost/python/args.hpp:25,
                 from /software/epd/include/boost/python.hpp:11,
                 from pysrc/PhotonArray.cpp:27:
/software/epd/include/boost/python/detail/destroy.hpp: In member function 'PyObject* boost::python::detail::caller_arity<3u>::impl<F, Policies, Sig>::operator()(PyObject*, PyObject*) [with F = void (galsim::PhotonArray::*)(const galsim::PhotonArray&, galsim::UniformDeviate), Policies = boost::python::default_call_policies, Sig = boost::mpl::vector4<void, galsim::PhotonArray&, const galsim::PhotonArray&, galsim::UniformDeviate>]':
/software/epd/include/boost/python/detail/destroy.hpp:33: error: dereferencing pointer 'p.808' does break strict-aliasing rules
/software/epd/include/boost/python/detail/destroy.hpp:90: note: initialized from here
/software/epd/include/boost/python/detail/destroy.hpp:33: error: dereferencing pointer 'p.808' does break strict-aliasing rules
/software/epd/include/boost/python/detail/destroy.hpp:90: note: initialized from here
scons: *** [pysrc/.obj/PhotonArray.os] Error 1

...etc.

On "mpe", I get two errors on the unit tests (not all diagnostic included below, let me know if you would like to see the rest of the error output):

======================================================================
ERROR: Check that using gridded outputs and the various getFoo methods gives consistent results
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/euclid/data02/euclid01/local/lib/python2.7/site-packages/nose-1.3.0-py2.7.egg/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/afs/ipp-garching.mpg.de/home/r/reiko/2code/GalSim/tests/test_lensing.py", line 478, in test_shear_get
    test_g1, test_g2 = my_ps.getShear((x.flatten(), y.flatten()), reduced = False)
  File "/afs/ipp-garching.mpg.de/home/r/reiko/2code/GalSim/galsim/lensing_ps.py", line 586, in getShear
    sbii_g1 = galsim.SBInterpolatedImage(self.im_g1, xInterp=interpolant2d)
RuntimeError: Trying to initialize InterpolatedImage with an empty image.

======================================================================
ERROR: Test the offset parameter to the draw and drawShoot function.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/euclid/data02/euclid01/local/lib/python2.7/site-packages/nose-1.3.0-py2.7.egg/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/afs/ipp-garching.mpg.de/home/r/reiko/2code/GalSim/tests/test_draw.py", line 679, in test_offset
    im.write('junk.fits')
  File "/afs/ipp-garching.mpg.de/home/r/reiko/2code/GalSim/galsim/fits.py", line 363, in write
    _write_file(file_name, hdu_list, clobber, file_compress, pyfits_compress)
  File "/afs/ipp-garching.mpg.de/home/r/reiko/2code/GalSim/galsim/fits.py", line 156, in __call__
    os.remove(file)
OSError: [Errno 2] No such file or directory: 'junk.fits'
-------------------- >> begin captured stdout << ---------------------

barnabytprowe commented 11 years ago

Hi Mike,

yes, I think that is just what I had in mind. here are some responses:

The basic idea that I'm working with is that a RealGalaxy now stores a noise attribute that starts out with the original noise profile given by the RealGalaxyCatalog (a new feature Rachel has added). Then as we do things to the galaxy, I check if there is such an attribute and do the same thing to it to keep it up to date. Then after the image is drawn, we can applyWhiteningTo the postage stamp to uncorrelate the noise. Finally, when we add the noise we want, we have to subtract off the variance that was left after the whitening process.

sounds exactly how I thought this would work

It occurred to me that we have to pad with the correlated noise so that the drawn image will have that correlated noise over the whole image. I guess this requires that the padded image is larger than the postage stamp to make sure there is not another edge where the noise drops to zero. I think this is ok, although the details are a little bit complicated to make sure this necessarily happens.

so #430 this is the issue where I originally planned to address this, as well as putting whitening into config, before I broke my bloody arm. you are actually tackling the work on this branch, so now that these issues are cross referenced i think we can just close #430 when the PR for #449 is complete.

anyway, as given away by the name of #430 , I thought that one easy mod to make sure this happens would be to make it possible to specify a minimum noise padding size when making the RealGalaxy (or InterpolatedImage if it makes more sense to put it there). you could just make this sqrt(2) x the final GREAT3 postage stamp size and we would always be safe. but there are other ways to do it

This procedure results in a slight hiccup at the edge of the original image, since the padded correlated noise cannot correlate across the original edge. I think this is just a required artifact of the process, but if you have any ideas about this, please let me know.

no, i had no way around this... it must be possible (perhaps borrowing from CMB inpainting techniques) but it seemed too complex for a first pass fix for correlated noise, especially since now we are only going to 23.5 rather than ~25 in depth for the main dataset. This takes the heat off the noise whitening a lot, at least for the main set of images (the deep subset are now likely to be the most affected). I thought that since it will only be a small fraction of the pixels affected, and that the effect has a natural fourfold symmetry in the ensemble of objects, we would very likely find the effect to be negligible

Now, the procedural question. I noticed you have applyExpansion in BaseCorrelatedNoise rather than either applyDilation or applyMagnification. I remember that this had something to do with the flux scaling and the fact that that works differently for the noise than for a profile. But what should I apply to the noise when the galaxy is either dilated or magnified?

oooh, this is a good question. yes, I think we wanted to use the word expansion rather than one of the other options because it was free-er of connotations. in this case I wanted to simply to be able to expand or contract the x-y coordinate size without affecting the variance scaling: this seemed like the most useful too. i've thought about this a bit, here is my line of reasoning:

surface brightness is conserved in lensing, so the variance of the noise in the lensed real-galaxies will be conserved in surface brightness normalization (even if the covariances change due to the coordinate transforms)
thus in surface brightness normalization I think what we need to do is use the magnification mu (a function of g1, g2 and kappa) in applyExpansion... because the coordinates do expand by the determinant of the Jacobian matrix in the course of the lensing, but the variance won't.
the difference between this and flux normalization is then just the pixel scale squared, as usual

(although i still struggle slightly to visualize all this in flux units though...)

I think I know the rest. applyShear and applyRotation are just the same. applyShift does nothing. And scaleFlux(factor) requires scaleVariance(factor**2), right?

Yes, I think that's right.

rmjarvis commented 11 years ago

Thanks Barney. I forgot about that issue. I think I'll actually put in a PR for what I have here so far and continue development on the whitening stuff in #430 instead. There's already a lot on this branch, and the whitening stuff isn't needed for running the control branch runs, so it's probably better to make a break at this point.

thus in surface brightness normalization I think what we need to do is use the magnification mu (a function of g1, g2 and kappa) in applyExpansion... because the coordinates do expand by the determinant of the Jacobian matrix in the course of the lensing, but the variance won't.

So, when I applyMagnification to a galaxy, we can just applyExpansion(sqrt(mu)) to the noise?

I guess I need to write up some unit tests for these operations that test whether the final image is appropriately whitened (and I get back the correct variance) after I do various things to it. Hopefully, it will be obvious when I've gotten the procedure wrong...

barnabytprowe commented 11 years ago

So, when I applyMagnification to a galaxy, we can just applyExpansion(sqrt(mu)) to the noise?

yes, I think so (I just checked the code and that is correct as far as I can see).

however, I definitely think it would be good to make a test or two for this as you suggest. to make sure it is obvious, it might be worth making a fake RealGalaxy using an InterpolatedImage that contains pure noise

GalSim-developers / GalSim

Allow more than one input catalog in config #449