Inputs to a multi-object capable GalSim

barnabytprowe commented 12 years ago

If we wish to create multiple object output in GalSim, we need to be able to take input configuration files / catalogues / something to specify either parameters for each object or a rule for selecting parameters.

TallJimbo commented 12 years ago

On configuration files, we could consider using Python's built-in ConfigParser module:

http://docs.python.org/library/configparser.html

I've never used it myself, but it seems to be designed to do a lot of what I think we'd want.

Another option would be to write configuration parameters in Python, but in separate files that are distinct from the library and script files (and would generally be read using Python's eval function rather than imported). We've recently started using this in LSST (albeit with a lot of custom infrastructure I don't think we need here), and I'm very happy with the results - you don't have to write a parser, and you aren't limited in what kinds of types and data structures you can use. You can even define callbacks and plugins directly in the configuration file, which can be a slippery slope, but it's very nice to be able to have a simple, one-line function available as a configuration value. In practice, I'd imagine these would be files full of (possibly nested) Python dictionaries and NumPy arrays, though we could also use some operator overloading to make the syntax look more appropriate for configuration data. I can provide some examples of this if desired.

TallJimbo commented 12 years ago

For catalog I/O, it would be worth trying to see if FITS binary tables (as read by PyFITS into structured NumPy arrays) would serve our needs. I think a binary data format has a lot of advantages over ASCII, and FITS tables are probably the only format that could compete with ASCII tables in terms of being familiar to users.

Most importantly, if we can make this work, we'll have to write very little additional code ourselves for handling catalogs, because the NumPy structured arrays are really very nice to work with.

rmjarvis commented 12 years ago

About the FITS binary table, I definitely think this should be an option, but I would want ASCII as an option as well. The way I wrote my shear pipeline, it can be set to output both a .fits file and a .dat file. That way a human can easily look at the results in the ASCII file, grep things, etc., but the computer can read the .fits file for increased efficiency. When not debugging, you can of course turn off the .dat file generation to save time. I would recommend having a similar capability in GalSim.

I don't think I have a preference about ConfigParser versus the eval technique. Both seem like they would do what we need with a reasonable syntax. One thing I wasn't sure about is whether ConfigParse can have values that are lists. For my (C++) configuration file reader, I set it up so you can write things like:

image_gain_key = GAIN  ARCONG
image_readnoise_key = RDNOISE  RON  ARCONRN

and the multiple items would be taken to be a list. Then the code checks for each listed value (as the FITS header keyword in this case) in order. I couldn't tell if ConfigParser has a similar capability. OTOH, I don't know that we need this capability for GalSim, so it may be a moot point.

rmandelb commented 12 years ago

I would like both ascii and FITS input options. if necessary, for the former, we could imagine writing code to convert some specific ascii input format into FITS binary tables, so that the code to handle the input catalogs will always expect the latter.

barnabytprowe commented 12 years ago

I've assigned myself to this task for the milestone, and it seems to me there are two kind of separate issues in one here.

One is defining how to read in a configuration file, the other is handling FITS / ASCII I/O from catalogues. As it's also relevant for #98, I'll start off by tackling the configuration file question here.

I'm fairly keen to try the eval() technique out - looking online it seems there are some potential issues about safety in network situations (i.e. you can potentially use __import__ from within eval to access os and sys to cause havoc) but why any of our users would wish to do this to their own machines using GalSim is outside the scope of my imagination!

So, first question: @TallJimbo do you happen to have an example handy for the LSST usage of eval() that we might be able to take a look at?

Secondly, I wanted to try and enumerate the things that will be contained in the configuration file, and think about sensible defaults. Here's my first attempt at listing (non-exhaustively of course) what I think we'll need...

PSF type(s) : could be a list for when we have multiple convolved components, e.g. ['Kolmogorov', 'OpticalPSF']. 'None' or 'Mixed' would be an instruction to take this information from the catalogue, object by object, allowing full flexibility for totally mixed catalogues (but is this something we ever want?).
Galaxy model(s) : similar to 1., although I can kind of see that here we might want the list to specify different galaxy types we'd like to sum, rather than convolve. A bit annoying, this difference.
Noise model(s) : again, 'None' or 'Mixed' could be used to interpret the noise model based on the row record for each galaxy in an object-by-object fashion.
Pixel scale(s) dx : sets the baseline physical units for undistorted pixels in a postage stamp (note that where we want to simulate geometric distortions this will currently have to be specified via Distortion object, since SBProfile.draw() methods only accept a single dx parameter rather than a dx, dy, and I think it would be cleaner if they stayed that way.)
Shear type : constant (with values) or variable (with some options, either catalogue given or a field model).
Output format : can only think of BigImage, Cube or Pstamps as options here for the moment.
Parameter distributions (c.f. #98) : these could perhaps be dictionary key-value pairs, with the key being the parameter and the distribution perhaps being some string that either refers to one of a defined set of distributions, or that allows the user to bring in their own user-defined distribution via eval() or some external python module.
Hmmmm, might need to come back to this list, I'm going to need to think some more... Contributions welcomed!

TallJimbo commented 12 years ago

For a big, complicated, full-featured example of the 'eval' approach, check out the "pex_config" package here:

http://dev.lsstcorp.org/cgit/LSST/DMS/pex_config.git/tree/

This does all kinds of validation and type-checking, and tracks the history of each parameter as its set in different ways (defaults, different levels of override files, command-line options, etc.).

You can find a much simpler example here, particularly in the "botconfig" and "config.py" files:

http://dev.lsstcorp.org/cgit/personal/jbosch/bot.git/tree/

This implementation doesn't do any kind of validation or history tracking, and it silently ignores most mistakes in the config file, but it's very concise.

I think we'll want something in-between; probably closer to the simpler one, unless we want to just take the LSST pex_config package as a whole and just modify it slightly to remove LSST dependencies.

barnabytprowe commented 12 years ago

Thank you Jim!

rmjarvis commented 12 years ago

Barney, I've been looking at the example config that you posted in the comments of #103, which I'll reproduce here for reference:

>>> config
noise.distribution = ['CCDNoise']
noise.gain = 1.0
noise.read_noise = 1.0
input_cat.type = 'ASCII'
input_cat.ascii_fields = ['x', 'y', 'nx', 'ny', 'Exponential.re', 'Exponential.e1', 'Exponential.e2', 'DeVaucouleurs.re', 'DeVaucouleurs.e1', 'DeVaucouleurs.e2', 'Moffat.FWHM', 'Moffat.truncationFWHM', 'Moffat.e1', 'Moffat.e2']
PSF.type = ['Moffat', 'Pixel']
PSF.Moffat.flux.distribution = 'Constant'
PSF.Moffat.flux.value = 1.0
PSF.Pixel.xw.distribution = 'Constant'
PSF.Pixel.xw.value = 1.0
PSF.Pixel.yw.distribution = 'Constant'
PSF.Pixel.yw.value = 1.0
user_config_file = None
galaxy.type = ['DeVaucouleurs', 'Exponential']
dx = 1.0
output.type = 'IMAGE'
logging_level = 20
shear.g2.distribution = 'Constant'
shear.g2.value = 0.0
shear.g1.distribution = 'Constant'
shear.g1.value = 0.0

I have some ideas about this. In particular how the hierarchy should work. This isn't completely thought through yet, so I'm just going to write it out and see how it goes. (i.e. I might end up disagreeing with myself by the end of this post and deleting the whole thing!)

I think it would be nice to have each high level thing (say noise or psf or galaxy) to have a "type". Then that type name defines what other attributes the high level thing has. Then at each lower level, something can either be just a value, or it can have a type attribute plus others. So here's an example:

noise.type = 'CCDNoise'  # required sky, gain, read_noise
noise.sky = 1.e3
noise.gain = 1.0
noise.read_noise = 1.0
psf.type = 'Moffat'  # requires flux, beta, and some kind of size
psf.flux = 1.0
psf.beta = 5.0
psf.fwhm.type = 'Gaussian'  # requires mean, sigma
psf.fwhm.mean = 2.3
psf.fwhm.sigma = 0.2
pixel.type = 'Square'  # requires size
pixel.size = 0.27
galaxy.type = 'Sum [ DeVaucouleurs , Exponential ]'  # requires item[0], item[1] with their attributes, plus an optional overall flux
galaxy.item[1].flux = 0.2
galaxy.item[1].half_light_radius = 4
galaxy.item[2].flux = 0.8
galaxy.item[2].half_light_radius = 6
galaxy.flux = 2000
shear = (0.0, 0.0)   # For constant, don't need the top-level type.

Like I said, it's not fully formed, but it's kind of close to what you already have. The advantage to this kind of structure I think is that we could extract out some of the lower level functionality that might apply to any of a variety of things. So e.g. once we know that psf.fwhm has a type, we know we'd have to generate values, but we could delegate the details to a generic GaussianGenerator function (whose name would be automatically created from eval(psf.fwhm.type + 'Generator'), which would be passed the psf.fwhm object for it to pull out the attributes that it needs. In this case .mean and .sigma.

For galaxy.type I showed an alternate way to specify some attributes by having more text after the type name. So I'm imagining that Sum really requires nitems and item (which would be an array of length nitems). So a longer version might be:

galaxy.type = Sum
galaxy.nitems = 2
galaxy.item[1].type = 'Devoucouleurs'
galaxy.item[1].flux = 0.2
galaxy.item[1].half_light_radius = 4
galaxy.item[2].type = 'Exponential'
galaxy.item[2].flux = 0.8
galaxy.item[2].half_light_radius = 6

Likewise a shorter notation for the psf.fwhm might be:

psf.fwhm.type = 'Gaussian 2.3 0.2'

And again, we could have a generic parser that would make the appropriate attributes from the string. Parse(psf.fwhm.type, psf.fwhm) which would strip off the first word and call a specific GaussianParse(end_of_type_string, psf.fwhm) which would know what kind of values the Gaussian type requires.

I'm not sure if all of this is completely coherent. (e.g. when to require the .type versus not.) But what do you think of the general idea?

I think getting all the non-constant implementations will take some time, so not likely to be done by Friday. Writing a generic python program that can take in the configuration file and parse it all correctly is probably a task best put off to the next milestone. For this one, we should probably stick to just defining what the high level items should be, but make each one a constant. So skipping the really interesting parts. :)

Sorry for the uber-long comment. I didn't end up deleting the whole thing, so that's good. But I did go back and edit things a couple times.

barnabytprowe commented 12 years ago

Hi Mike,

Yes, I think we are very much thinking along the same lines. And, foolhardily/ambitiously, this was exactly what I was trying to get implemented in time for Friday! Might have to reel it back. On this branch I've written a config file to approximate much of the settings in your config file (examples/inputs/MultiObjectDemo1_config), which loads in fairly well with

import galsim
config = galsim.config.load('examples/input/MultiObjectDemo1_config')

I've also written a default catalogue file, for general reference and that works with the default config file, and was planning to make a similar one for the MOD1 script (also in examples/inputs).

I was wondering, do you think it might be worth having a quick telephone call about this tomorrow? I think I've slickened the interface a little (anyone else is welcome to join too... @rmandelb , @TallJimbo / whoever's available!)

Unfortunately, however, I've run into something of a brick wall with the AttributeDict class late this evening, which I'm going to make a quick issue to ask for help from our experienced python experts about...

barnabytprowe commented 12 years ago

Brick wall scaled, lesson learned = don't code so late at night. Sorry guys!

Things are getting there, and I think the frontend might now work a lot better... Let me know if you want a chat Mike, this stuff is very much in your area of expertise and I'd welcome a sanity check!

rmjarvis commented 12 years ago

On May 9, 2012, at 4:45 AM, Barnaby Rowe wrote:

Brick wall scaled, lesson learned = don't code so late at night. Sorry guys! Who are you kidding? That lesson isn't learned. :)

Things are getting there, and I think the frontend might now work a lot better... Let me know if you want a chat Mike, this stuff is very much in your area of expertise and I'd welcome a sanity check!

I just looked at the stuff you have in #101. Specifically MultiObjectDemo1_config. Here are some comments:

I don't really like having to put "config." at the start of everything. From the user's perspective, this is a config file. Of course everything is about config. Can we make the parser consider that automatic?
The pixel isn't really part of the PSF. I think we want this to be a separate item at the top level.
I don't think we want the [] to implicitly mean sum (for galaxy). I'd rather have an explicit Sum type which is described by multiple items (as per my previous comment). Different things will need lists to mean different things, so I'd rather have the meaning of the list be explicit. e.g. you had [] for PSF to mean convolution, since you included the pixel. I don't think we want that, but we will want a convolution of atmosphere and optics, so then the type would be Convolution with items just like my idea for Sum.
Another point on using item[0], item[1] for the different components of the galaxy rather than the types of each component as you have is that we might want the two components to be the same. e.g. Both Sersics, but with different properties. So

galaxy.type = "Sum [ Sersic, Sersic ]"
galaxy.item[0].n = "GaussianDeviate 1.5 0.1"
galaxy.item[1].n = 3.5

I'm not sure about the dx,dy attributes, how we want those to work. I was thinking it would be clearer to have the type be Shifted, which would spawn required attributes of base, dx, dy. But maybe we want that to be implicit whenever dx,dy attributes are detected. (So implicitly, dx=dy=0.) We could do the same for g1,g2,theta. Probably not dilation, since probably that is always doable via the fwhm or whatever. However, if we go that way, we need to have a convention for the order of operations: shear -- then rotate -- then shift? They don't commute!
The current dx,dy specifications aren't quite general enough as it stands now. For example, it doesn't include the version that I coded up for MultiObjectDemo.py. There, I had a uniform distribution within a circle of specified radius. That distribution isn't possible from specifying dx and dy separately. So maybe have a .shift attribute that can have different types. Have to think a bit about what grammar makes sense here.
Rather than None indicating that something will come from an input catalog, I'd rather InputCatalog be a valid type for things. Which would then require a col attribute, which should either be a number (for ascii input catalog) or name (for fits input catalog). I think this would be a lot cleaner than re-parsing the names of the items in the input_cat.ascii_fields array. e.g.

galaxy.item[0].g1.type = "InputCatalog"
galaxy.item[0].g1.col = 1
...
input_cat.type = 'ASCII'
input_cat.file_name = 'blah.cat'
# No need for a input_catalog.ascii_fields

One advantage to this is that the input catalog can have any number of columns, most of which can be ignored. We'd only read useful values from the columns specified.

I'd rather use type where you have distribution. I think that would allow things to be more general. e.g. you might want a value to be a sum of a distribution and a value. Or the sum of a value from the InputCatalog and a distribution. etc.
Your output types break the convention in the rest of the file that types are CamelCase. You used all caps for IMAGE, PSTAMPS, CUBE.

Hmm... I'm starting to think that this is a bit more complicated than we want to try to implement by Friday. There are a lot of aspects of this that probably deserve a fair amount of discussion before going too far into the implementation. (This won't be as trivial an implementation as for the wild goose chase I took you on with the createSheared stuff.) Should we consider bumping all of this to the next milestone? In fact, this might be an appropriate entire next milestone. Have an executable that reads in a config file and does everything correctly. Quite a daunting task actually.

rmjarvis commented 12 years ago

For this milestone, maybe we could concentrate on getting the inputs from an input catalog. You've already done a lot of that work, and that is quite a significant extension to our previous capabilities. And probably manageable by Friday. But maybe defer some of these more complicated ideas about distributions.

rmandelb commented 12 years ago

Already said some of this on the phone yesterday, but for the purpose of hitting a wider audience I'll say it here :) -- I think it is fair to remove some of the above from the scope of this milestone. It's possibly even a better choice because if we start with some basic chunk of the input catalog capability, we can play with it and see how we like it for a while before implementing the rest. This could in the end save us time and trouble. The example we discussed on the phone was only implementing FITS or ascii input but not both -- however, another reasonable thing (as Mike suggested) would be to remove specification of inputs to parameter distributions. You've done homework on getting some reasonable distributions into the code so that they can be called by someone who writes a script, so perhaps we should just make sure we like the overall input catalog format before getting it to do something trickier like specify distributions?

Doing a subset of our initial goal, but doing it more carefully than we'd have time to do otherwise seems like the way to go here.

I could probably join a phone call today.

rmjarvis commented 12 years ago

To focus the effort, I wrote up a Script2 in MultiObjectDemo.py of branch #103.

Does this seem like a reasonable thing to try to get working?

It doesn't read the configuration from a file. For a demo script it seems to make more sense to define it inline, but that's negotiable if you'd rather showcase the feature of reading in a config file.

Also, I only use the simple style of having the type be a string with just a single word. So not what I would eventually prefer: config.psf.beta.type = 'InputCatalog 5'. Or even config.psf.beta = 'InputCatalog 5' (which I think only works if we don't need any other attributes for config.psf.beta).

And I didn't try to run the script yet, since I know it won't work yet. So there could be some syntax errors in there.

rmandelb commented 12 years ago

A few comments:

1) yes, this was pretty much what I had in mind. Ultimately, I'd like to make some automated routines that let a user specify a galaxy or PSF type, and automatically go through the steps to make a grid of observed galaxies with different position angles or whatever. But that's longer term - this is good for now.

2) I was hoping to showcase the feature of reading in a config file, but I think that what you have now is good to keep until #101 is a little further along.

3) On a very minor side note, I noticed you did an approximate S/N estimate. For consistency with GREAT08, I suggest we use their S/N definition. I've taken the liberty of pushing a version where I put that in (it's commented out, so you can choose to include it if you wish, but it's totally up to you).

However, I wasn't able to test it because there seems to be an issue with galsim on this branch. I did scons -c and scons -c install, then scons and scons install, and when I try to import galsim, I get

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/Users/rmandelb/great3/GalSim/<ipython-input-1-645d464f8d43> in <module>()
----> 1 import galsim

/Users/rmandelb/great3/GalSim/galsim/__init__.py in <module>()
      1 from ._galsim import *
      2 from . import fits
----> 3 from base import *
      4 from . import noise
      5 from . import atmosphere

/Users/rmandelb/great3/GalSim/galsim/base.py in <module>()
    237 
    238 
--> 239 class Pixel(GSObject):
    240     """GalSim Pixel, which has an SBBox in the SBProfile attribute.
    241     """

/Users/rmandelb/great3/GalSim/galsim/base.py in Pixel()
    240     """GalSim Pixel, which has an SBBox in the SBProfile attribute.
    241     """
--> 242     def __init__(self, xw=1., yw=xw, flux=1.):
    243         GSObject.__init__(self, galsim.SBBox(xw=xw, yw=yw, flux=flux))
    244     # Ditto!

NameError: name 'xw' is not defined

Was worried that this came about because of my switching from branch #65, but I did clean up before compiling, so I don't think that can be the case. So, I wasn't able to check whether what I added in the commented out bits would work.

Now, I'm off to investigate whether this issue happens on master and on #65... I was on the latter earlier and ran programs without issues, so I can't imagine it's there, but I will be thorough and check this.

On May 9, 2012, at 8:33 AM, Mike Jarvis wrote:

To focus the effort, I wrote up a Script2 in MultiObjectDemo.py of branch #103.

Does this seem like a reasonable thing to try to get working?

It doesn't read the configuration from a file. For a demo script it seems to make more sense to define it inline, but that's negotiable if you'd rather showcase the feature of reading in a config file.

Also, I only use the simple style of having the type be a string with just a single word. So not what I would eventually prefer: config.psf.beta.type = 'InputCatalog 5'. Or even config.psf.beta = 'InputCatalog 5' (which I think only works if we don't need any other attributes for config.psf.beta).

And I didn't try to run the script yet, since I know it won't work yet. So there could be some syntax errors in there.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5598792

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

rmandelb commented 12 years ago

Um, github misinterpreted something in the previous message and so you have to click on the dots to read the whole thing. Sorry about that.

rmandelb commented 12 years ago

And I finished checking: the issue I found with galsim on branch #103 (which we're confusingly discussing in #101!) is not there on master or #65.

TallJimbo commented 12 years ago

A few belated technical comments on the implementation of the config stuff:

The AttributeDict class is very flexible because it adds new new nodes when they're first referenced, but unless you assign to them, the new node will be another AttributeDict. That means something like this galaxy.item[1] = bla won't work unless someone has previously done galaxy.item = [] to ensure that galaxy.item is a list.

This also means users can add whatever config parameters they want, or possibly use the wrong data type for them (and the system will silently ignore many typos in the config file). And Barney's efforts to add tab completion will only work if we've already setup the hierarchy of possible settings in advance.

It looks like that's mostly happening already (maybe accidentally by virtue of having a complete default config), but I think we might want to enshrine the principle of specifying all the valid options in advance in the design, and maybe even have a different syntax to explicitly add attributes so we can prevent users from setting unknown config parameters implicitly.

We may want to save that for the future; having AttributeDict implicitly add new nodes on first use is very nice for prototyping, but I think we may want to move away from it when our users start to outnumber our core developers.

rmjarvis commented 12 years ago

Rachel, I haven't tried to run the new Demo script, so it doesn't surprise me that there are errors. In fact I know that some of the functions I used don't work or aren't even defined yet.

rmjarvis commented 12 years ago

Jim, fair enough. At this point, I'm just mocking up stuff from the perspective of what would be convenient to do as a user, so I haven't been worried too much yet about feasibility of the implementation.

So, does AttributeDict allow us to specify extra actions to take when we assign a value? e.g. When we assign galaxy.nitems = 2, can we make it smart enough to automatically create galaxy.item = [] at that point?

If not, I guess we could have galaxy.item1 and galaxy.item2, etc. (Sorry, I keep waffling between 0-based and 1-based indexing for these.) Then our later parser would know from the value of nitems what the attribute names will need to be.

Any other ideas? Scrap the whole item thing and go some other way?

rmandelb commented 12 years ago

Mike, the errors have nothing to do with the demo script. If I start python and do import galsim using the version I compiled sitting on this branch, it complains.

TallJimbo commented 12 years ago

I think it would be possible to add any of that functionality to AttributeDict (or at least there's some other way to make that syntax work by switching to something a little more complex than AttributeDict), but none of it is there yet.

The "galaxy.item1", "galaxy.item2" approach would already work, and it would be very easy to add a syntax like this:

galaxy.items = galsim.config.List(2)
galaxy.items[1].n = 1.0
galaxy.items[2].n = 4.0

I'd be happy to add something like that right now to let you play with a slightly improved syntax while thinking about the schema, and we can think about ways to make it ever more natural later.

rmjarvis commented 12 years ago

galaxy.items = galsim.config.List(2)

I don't think that's any clearer than galaxy.items = [], so I guess let's stick with the shorter version for now.

Also, item or items? I think galaxy.item[0] = ... reads better, but if people prefer the plural, we can do that instead.

TallJimbo commented 12 years ago

I only prefer plural because it's more natural when you want to refer to the whole list; singular is slightly nicer when setting a single element, but I think you generally want to use the same name in both places.

barnabytprowe commented 12 years ago

Hi all,

Many thanks for the input - I've felt like a few big decisions were being made on this branch, so it's definitely good to get your opinions. I see this work as being a straw man in many ways - I'm sure that whatever I/we come up with will evolve - but I've tried to be ambitious in the sorts of capabilities we might have so as to provide a basis for the discussion that will dictate this evolution.

For lack of a better idea, I'll reply to the various comments one by one (trying to be as succinct as possible!):

I don't really like having to put "config." at the start of everything. From the user's perspective, this is a config file. Of course everything is about config. Can we make the parser consider that automatic?

Yes, I'm sure we can although I think I 'd still like to have the code bundle all these things into a single config object even if the user doesn't have to... ( @TallJimbo this is possible I assume?). I would find a single config object convenient to use as inputs to the various pieces of the code we will have for image generation, output I/O etc., and I don't think it's any significant overhead to pass around as it's just a bundle of references anyway.

There is one thing on this point: in examples/inputs/MultiObjectDemo1_config I performed some calculations in the config file body (using the fact that it's Python as a handy notepad, essentially) and then saved the salient outputs to config.* attributes. Maybe I'm being too liberal as regards I/O, but I did like the ability to do that, and it would be nice to come up with some way of still being able to do this cleanly if we drop the config. object at the start.

The pixel isn't really part of the PSF. I think we want this to be a separate item at the top level.

Sometimes it might be handy not to pixelate - e.g. if you simply wanted to make samples of images at hi-res to later be pixelated, or if you wanted to start throwing in some non-standard pixel response functions. I'd want to be able to do this, and simply putting in the pixel as another PSF seemed the simplest consistent way. I guess, I advocate the view that the pixel is really part of the PSF - once you think about things that way, useful results from sampling theory open up to you. But that's just my warped brain, and I understand it's not the commonest viewpoint.

I don't think we want the [] to implicitly mean sum (for galaxy). I'd rather have an explicit Sum type which is described by multiple items (as per my previous comment). Different things will need lists to mean different things, so I'd rather have the meaning of the list be explicit. e.g. you had [] for PSF to mean convolution, since you included the pixel. I don't think we want that, but we will want a convolution of atmosphere and optics, so then the type would be Convolution with items just like my idea for Sum.

OK, that would be better yes.

Another point on using item[0], item[1] for the different components of the galaxy rather than the types of each component as you have is that we might want the two components to be the same. e.g. Both Sersics, but with different properties. So galaxy.type = "Sum [ Sersic, Sersic ]" galaxy.item[0].n = "GaussianDeviate 1.5 0.1" galaxy.item[1].n = 3.5

I guess I just don't prefer hiding the names in a list, BUT your example is killer. Although galaxy.Sersic.n = blah seems way more obviously readable to me than galaxy.list[1].n = blah, and while with the attribute dict and some occaisional eval() or exec in the code I think either are viable for parsing, your point alone necessitates a rethink.

The current dx,dy specifications aren't quite general enough as it stands now. For example, it doesn't include the version that I coded up for MultiObjectDemo.py. There, I had a uniform distribution within a circle of specified radius. That distribution isn't possible from specifying dx and dy separately. So maybe have a .shift attribute that can have different types. Have to think a bit about what grammar makes sense here.

Yes, I was aware of this when I kludged in a Gaussian instead for no better reason than that at least was possible! A shift attribute would be good - something that can accept polar or cartesian coordinates. Combining that with a super wrapper around the UniformDeviate to allow rescaling to arbitrary ranges would be enough. But this is all already getting quite off piste, isn't it! Sigh!

Rather than None indicating that something will come from an input catalog, I'd rather InputCatalog be a valid type for things. Which would then require a col attribute, which should either be a number (for ascii input catalog) or name (for fits input catalog). I think this would be a lot cleaner than re-parsing the names of the items in the input_cat.ascii_fields array.

Yep, that would be better.

I'd rather use type where you have distribution. I think that would allow things to be more general. e.g. you might want a value to be a sum of a distribution and a value. Or the sum of a value from the InputCatalog and a distribution. etc. -

The generality argument is good!

Your output types break the convention in the rest of the file that types are CamelCase. You used all caps for IMAGE, PSTAMPS, CUBE.

Good point, will modify.

Thanks for those Mike, very useful. I'm working on this pretty much 100% today. Will address further comments below from others...

barnabytprowe commented 12 years ago

Already said some of this on the phone yesterday, but for the purpose of hitting a wider audience I'll say it here :) -- I think it is fair to remove some of the above from the scope of this milestone. It's possibly even a better choice because if we start with some basic chunk of the input catalog capability, we can play with it and see how we like it for a while before implementing the rest. This could in the end save us time and trouble. The example we discussed on the phone was only implementing FITS or ascii input but not both -- however, another reasonable thing (as Mike suggested) would be to remove specification of inputs to parameter distributions. You've done homework on getting some reasonable distributions into the code so that they can be called by someone who writes a script, so perhaps we should just make sure we like the overall input catalog format before getting it to do something trickier like specify distributions?

Doing a subset of our initial goal, but doing it more carefully than we'd have time to do otherwise seems like the way to go here.

I would go along with that. I do feel it's so close though, and would be really neat. Tell you what: I'll totally focus on getting the catalog handling to everyone's liking today, and base it all around Mike's Demo Script 2.

If we see miraculous progress, and consensus, we can always try a fancy config file version (with distrubutions and bells and whistles) for Demo Script 3.

rmjarvis commented 12 years ago

On May 9, 2012, at 12:59 PM, Barnaby Rowe wrote:

Yes, I'm sure we can although I think I 'd still like to have the code bundle all these things into a single config object even if the user doesn't have to...

Definitely agree. I only meant that when writing a config file, the user doesn't have to write config over and over. Within the code that uses it, these should all be in a single config object.

There is one thing on this point: in examples/inputs/MultiObjectDemo1_config I performed some calculations in the config file body (using the fact that it's Python as a handy notepad, essentially) and then saved the salient outputs to config.* attributes. Maybe I'm being too liberal as regards I/O, but I did like the ability to do that, and it would be nice to come up with some way of still being able to do this cleanly if we drop the config. object at the start.

I thought of that too. It would work as stands, and those attributes of the resulting config would be saved but ignored. To be explicit about that the user could preface each with junk. or temp.. But I don't think that's required.

The pixel isn't really part of the PSF. I think we want this to be a separate item at the top level.

Sometimes it might be handy not to pixelate - e.g. if you simply wanted to make samples of images at hi-res to later be pixelated, or if you wanted to start throwing in some non-standard pixel response functions.

Indeed. And that could be one of the "types" allowed for a pixel. Maybe "DirectSample" or "DeltaFunction". But it would still require a pixel.scale so we know what grid to put it on. So I don't think you can get away with not having a pixel. You need to define the pixel scale of the image somewhere, and I thought it best to put that at the top level.

barnabytprowe commented 12 years ago

The AttributeDict class is very flexible because it adds new new nodes when they're first referenced, but unless you assign to them, the new node will be another AttributeDict. That means something like this galaxy.item[1] = bla won't work unless someone has previously done galaxy.item = [] to ensure that galaxy.item is a list.

This also means users can add whatever config parameters they want, or possibly use the wrong data type for them (and the system will silently ignore many typos in the config file). And Barney's efforts to add tab completion will only work if we've already setup the hierarchy of possible settings in advance.

It looks like that's mostly happening already (maybe accidentally by virtue of having a complete default config), but I think we might want to enshrine the principle of specifying all the valid options in advance in the design, and maybe even have a different syntax to explicitly add attributes so we can prevent users from setting unknown config parameters implicitly.

I think this is a sensible point. The AttributeDict is a wonderfully powerful little widget, but as you say it is totally open to abuse and typo-vandalism! We should certainly consider a more locked-down version we we have a better idea of the groupings of input info we want to make.

I also think that a new, us-defined method for adding attributes would be helpful too. One thing I noticed is, unsuprisingly, that

config.cheese.camembert = "TooRunny"

sets up camember as an attribute of cheese which is an attribute of config, but

config.__setattr__("cheese.camembert", "TooRunny")

instead makes config have an attribute "cheese.camembert". To get the inline functionality of the top example using function calls as in the second example (which I'd like to do to cut down on the use of exec where possible) would also be nice to add to this dedicated method for the AttributeDict()... Currently the only way I could think of to do it was nested __setattr__ commands, which is just rather ugly on the eye and not terribly easily generalized when you don't necessarily know how many nested levels of attributedicts you want to put in place.

TallJimbo commented 12 years ago

I don't really like having to put "config." at the start of everything. From the user's perspective, this is a config file. Of course everything is about config. Can we make the parser consider that automatic?

Yes, I'm sure we can although I think I 'd still like to have the code bundle all these things into a single config object even if the user doesn't have to... ( @TallJimbo this is possible I assume?). I would find a single config object convenient to use as inputs to the various pieces of the code we will have for image generation, output I/O etc., and I don't think it's any significant overhead to pass around as it's just a bundle of references anyway.

It's possible with the caveat that we'd have to explicitly pass all the top-level config names as AttributeDicts to the eval function so they already exist in the context the config file is evaluated. In other words, if we want a statement like psf.beta = 3.2 to work, we need to pass (at least) {'psf': AttributeDict()} as an argument to eval.

That's no problem if we ultimately plan to enumerate all the possible config values in advance, but it's a bit of a pain when prototyping it.

rmjarvis commented 12 years ago

Barney said (in #140):

If there's strong opposition to default initialization so be it, but as I said it would make the implementation of (e.g., in #101) the super-early-prototype build_galaxy_image() in galsim/frontend.py dependent on us having some sort of global dictionary of object types and their expected parameters, and require us to keep this updated with any new param names in the base classes themselves as they change. Is there an easy/elegant way of doing this in Python, or would it be case of keeping a central dictionary up-to-date and writing unit tests to ensure that we don't forget?

The possibly elegant way I had in mind was for each type of builder to have its own function. So let's take my above example galaxy specification, reproduced here:

galaxy.type = Sum
galaxy.nitems = 2
galaxy.item[1].type = 'DeVaucouleurs'
galaxy.item[1].flux = 0.2
galaxy.item[1].half_light_radius = 4
galaxy.item[2].type = 'Exponential'
galaxy.item[2].flux = 0.8
galaxy.item[2].half_light_radius = 6

Then the line to build the galaxy would be:

gal = galsim.build_gal_image(config.galaxy, input_cat, logger)

Within build_gal_image (I might rename this simply ProfileBuilder in parallel with what else I'm about to propose, but you can keep this name and change the later names if you prefer), it would read the type attribute from the galaxy object and find that it is 'Sum'. So it would in turn call SumBuilder by doing something like eval(config.type + 'Builder') (where config is now what was originally config.galaxy) to get the right function name.

Then SumBuilder would know that it is looking for nitems and item. It would then make nitems profiles and add them together and return the result. Each individual profile would be built by

profile[i] = ProfileBuilder(config.item[i])

Now back to the original ProfileBuilder. It works the same way now on what was originally config.galaxy.item[0] and config.galaxy.item[1], and finds that the type is DeVaucouleurs and Exponential respectively. So eval(config.type + 'Builder') now calls the functions DeVaucouleursBuilder and ExponentialBuilder, each of which looks for the attributes that it needs.

I see the same kind of functions working for things that return values (rather than profiles). If something that should be a value has a type attribute, then we set the value with eval(config.type + 'Generator') which might be GaussianDeviateGenerator, PolynomialGenerator, InputCatalogGenerator, etc. (The last one reads a bit funny, but I think it's ok.) All of these return a value for the calling function to use where it needed a value.

rmjarvis commented 12 years ago

I realized that nouns may not be the best choice for these function names, so maybe BuildProfile, BuildSum, BuildExponential, etc. Likewise GenerateFromGaussianDeviate, GenerateFromInputCatalog, etc. The last one especially reads better than my above function name.

barnabytprowe commented 12 years ago

Hi Mike, thanks as always for your input on this, it's vital to us getting a good working solution. I had to take a few hours off to tackle other things but it now has my undivided attention until my eyes close. First, the development for this and #103 is clearly happening in parallel, so I will keep the two branches merged at my end from now on (if that's OK - I will not make major edits without discussion).

I think that having a BuildX function for every GSObject would work, but I wonder whether that's actually not more labour and more potential places to err (i.e. just from being more code) than simply setting the args of the GSObject using some (admittedly maintained) prior knowledge as to what they should be.

(On the topic, I found this: http://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/)

Here's a counter proposal, and I'll use the Exponential and Sersic a relevant examples.

In galsim/base.py we have a static dictionary of some kind containing our object names and their required params, e.g.:

_type_param_dict = {"Exponential": ("flux", "r0"),
                                  "Sersic": ("n", "flux", "re"), ... }

and so on.

This seems to me to concentrate all the important info in one place, and can be easily tested against to ensure correct regression. We also have only one generic ProfileBuilder(config.items[i]). (Aside: I'm broadly with Jim on pluralizing items, since our list comprehensions are then for item in config.items rather than e.g. for singleitem in config.item. Also, I'd offer a vague preference for ObjectBuilder, since our GSObjects have Profiles rather than are them...).

Anyway, that ProfileBuilder(config.items[i]) would do something like:

def ProfileBuilder(item):

    init_params = []
    for param_name in galsim._type_param_dict[item.type]:
        param_value = item.__getattr__(param_name)
        init_params.append(param_value)                       # this extra line just for readability!

    init_func = eval("galsim."+item.type)
    gsobject = init_func(*init_params)

I may have got the syntax wrong in places, but I think the idea might work?? Of course, it's a subjective choice at some level between this and a family of BuildX functions, but I also think that it might just generally be handy having something like a _type_param_dict object around...

...Another thing in it's favour being that it could be used when parsing user inputs, and let the user know immediately if they have failed to set something that is required, giving a direct message about what is wrong. This could also be handled by the BuildX functions, but by then you're already deep in the code: it would be nice to have this handled at the top level.

In the meantime, Script 2 looks fantastic, and I'll now set about making it work!

rmjarvis commented 12 years ago

How would your method work when type = Sum? The constructor for a GSObject Sum already needs the components to have been built. But I don't see an easy way to have this happen with your method.

You're right though that the BuildX functions for the main kinds of profiles would all be pretty similar, so it might be worth combining them into a single function like what you describe. I guess I just like the elegance that each of these functions would be very short and localized, not relying on information that is stored somewhere else in the code. But that's just a style preference. If your dict method will work and you prefer it, then you should go with it.

barnabytprowe commented 12 years ago

Hmmm, let's see...

def ProfileBuilder(item):

    if item.type in galsim._type_param_dict:

        init_params = []
        for param_name in galsim._type_param_dict[item.type]:
            param_value = item.__getattr__(param_name)
            init_params.append(param_value)                       # this extra line just for readability!
        init_func = eval("galsim."+item.type)
        gsobject = init_func(*init_params)

    elif item.type == "Sum":

        gsobjects = []
        for i in item.nitems:
            subitem = item.item[i]
            init_params = []
            for param_name in galsim._type_param_dict[subitem.type]:
                param_value = subitem.__getattr__(param_name)
                init_params.append(param_value)                       # this extra line just for readability!
            init_func = eval("galsim."+subitem.type)
            gsobjects.append(init_func(*init_params))
        gsobject = galsim.Add(gsobjects)

would do it (I think). I'd need to test! We'd also need something very similar for other compound objects, but currently there's only Convolve and I can't think of other operations we'd want for users.

Can I give it a shot? I'd like to see if I can make it work... And I'm shamefully won over by my own argument that a _param_dict would be handy for giving the user error messages about missing parameters :) If not, or if it fails dismally, I promise we'll know in a few hours and then I'm very happy to go for the BuildX plan!

barnabytprowe commented 12 years ago

ooops corrected a typo in the above code...

rmjarvis commented 12 years ago

How about this. Let's both do a quick prototype of our ideas. You on '#101' and me on '#103' and then we can compare what we come up with. Then we're guaranteed to follow Jim's maxim of throwing away our first design. We'll just define the "first" one to be the one we don't end up using. :)

barnabytprowe commented 12 years ago

Haha, except that I guarantee you will be first!

Regarding the demo script... The storage of column numbers in e.g. config.psf.beta.col = 5 is kind of backwards compared to how I was doing the catalog I/O. Would you prefer indeed for the correct column to be specified by giving each name a number in that way, rather that is than it coming from the position of the relevant name in a list? It's easier to go in one way than in another, and I can see the appeal of your approach actually... OK ignore me, I've alread convinced myself: I'll rewrite the ASCII input funcs first (and CamelCase them - I always forget this, as previously I've mostly worked in case-insensitive languages) so you can pull functions that better match your current scheme. Will let you know...

On 9 May 2012, at 16:08, Mike Jarvis wrote:

How about this. Let's both do a quick prototype of our ideas. You on '#101' and me on '#103' and then we can compare what we come up with. Then we're guaranteed to follow Jim's maxim of throwing away our first design. We'll just define the "first" one to be the one we don't end up using. :)

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5614254

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

barnabytprowe commented 12 years ago

Haha, except that I guarantee you will be first!

...In the completion sense I hasten to add, not the standing the test of time sense! there I have no intuition...

rmjarvis commented 12 years ago

I think the read can be very straightforward. The resulting structure can just store an array of arrays of values. The first level would be by line number and then the second by column.

Then when we want to get column col of item n, we just ask for catalog.data[n][col]. So I think that's easy on both counts.

For fits, the access would be a bit different, but still pretty easy. The read is trivial. You just make a pyfits object that reads the file. Then when you want a column col for item n, you do catalog.fits_table.field(col)[n].

barnabytprowe commented 12 years ago

Well that just saved me some time doing something dumbly elaborate, thanks!

On 9 May 2012, at 16:29, Mike Jarvis wrote:

I think the read can be very straightforward. The resulting structure can just store an array of arrays of values. The first level would be by line number and then the second by column.

Then when we want to get column col of item n, we just ask for catalog.data[n][col]. So I think that's easy on both counts.

For fits, the access would be a bit different, but still pretty easy. The read is trivial. You just make a pyfits object that reads the file. Then when you want a column col for item n, you do catalog.fits_table.field(col)[n].

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5614604

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

barnabytprowe commented 12 years ago

Hi Mike, I've merged my changes on io.py and in the MOD script into both #103 and #101... please check and take a look. The input catalogue data is stored as a numpy array that can be accessed via either

input_cat.data[n][col]

or

input_cat.data[n, col]

as per preference (so much for the Zen ;) I'll get to work on my config parsing plans now.

barnabytprowe commented 12 years ago

I've also merged in all the changes from #65, fixed the odd minor conflict and updated DeVaucouleurs in base.py to match new usage...

rmandelb commented 12 years ago

You beat me - I was going to volunteer to do the DeVauc fix, since I'd just been monkeying around with radius stuff.

On May 9, 2012, at 8:31 PM, Barnaby Rowe wrote:

I've also merged in all the changes from #65, fixed the odd minor conflict and updated DeVaucouleurs in base.py to match new usage...

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5615474

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

barnabytprowe commented 12 years ago

Hmmmm... I actually just noticed I made an error... There is no scale_radius= optional kwarg for the DeVauc. I'll remove it!

rmandelb commented 12 years ago

Yeah, it could be because the scale radius is usually some ridiculous value compared to half-light radius for n=4, so it's not something that people usually think about?

On May 9, 2012, at 8:34 PM, Barnaby Rowe wrote:

Hmmmm... I actually just noticed I made an error... There is no scale_radius= optional kwarg for the DeVauc. I'll remove it!

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5615508

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

barnabytprowe commented 12 years ago

Yes, perhaps. Its value must be buried down in the C++ class somewhere, so this could maybe be an ultra-low priority cleanup issue!

On 9 May 2012, at 17:35, Rachel Mandelbaum wrote:

Yeah, it could be because the scale radius is usually some ridiculous value compared to half-light radius for n=4, so it's not something that people usually think about?

On May 9, 2012, at 8:34 PM, Barnaby Rowe wrote:

Hmmmm... I actually just noticed I made an error... There is no scale_radius= optional kwarg for the DeVauc. I'll remove it!

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5615508

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5615530

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

rmandelb commented 12 years ago

Could add it to the new issue I just opened. :) Or not, I don't know that someone will really want to use a scale radius for Sersic/deVauc? Anyway, I vote for playing with the new radius specification rules for a bit before deciding what additional things we do/don't need.

On May 9, 2012, at 8:38 PM, Barnaby Rowe wrote:

Yes, perhaps. Its value must be buried down in the C++ class somewhere, so this could maybe be an ultra-low priority cleanup issue!

On 9 May 2012, at 17:35, Rachel Mandelbaum wrote:

Yeah, it could be because the scale radius is usually some ridiculous value compared to half-light radius for n=4, so it's not something that people usually think about?

On May 9, 2012, at 8:34 PM, Barnaby Rowe wrote:

Hmmmm... I actually just noticed I made an error... There is no scale_radius= optional kwarg for the DeVauc. I'll remove it!

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5615508

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5615530

Barnaby Rowe

Jet Propulsion Laboratory California Institute of Technology MS 169-237 4800 Oak Grove Drive Pasadena CA 91109 United States of America

Department of Physics & Astronomy University College London Gower Street London WC1E 6BT United Kingdom

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/101#issuecomment-5615559

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

barnabytprowe commented 12 years ago

Roger that!

rmjarvis commented 12 years ago

Barney, I've got script 2 working with my BuildX and GererateX functions on branch #103 if you want to check it out.

barnabytprowe commented 12 years ago

Great, I'll take a look!

GalSim-developers / GalSim

Inputs to a multi-object capable GalSim #101