Mechanisms for more restricted config files

TallJimbo commented 12 years ago

We've made good progress prototyping our configuration files using the AttributeDict class on other issues (#101, #103), but eventually we'll probably want to use something that makes it easier to catch user errors and define what all the available configuration options are.

I plan to use LSST's pex_config package as a starting point, but I don't think we need all the features it has and the complexity they add.

I'm assigning this to myself to start with, as I've already thought about how to do this quite a bit when we were designing pex_config. Once the basic skeleton is in place I'll probably want to pass it off to others for refinement if other people like the approach.

TallJimbo commented 12 years ago

The basic machinery I was planning to add is now mostly in place. I'm still sorting through how to make use of it in defining the configuration schema and parsing it with a catalog. There are a lot of big questions there, so I think it's time to get feedback from others (especially @barnabytprowe, @rmjarvis, @rmandelb) before pressing forward. This could turn into a major overhaul of a lot of things (and I think maybe it should, but maybe not yet).

Overall, the goal is to provide some machinery that would allow us to define the configuration schema and how to use it to construct e.g. GSObjects from it all in one place. Instead of AttributeDict, each node in the schema is a distinct class (derived from NodeBase with specialized accessors (the Field class). That's a lot of new classes, but each one should be very easy to define, because all the hard work is done by base classes.

Overall, I think this is a much more object-oriented, extensible approach, and it has a big advantage in that it doesn't allow users to put arbitrary things in the config that will be silently ignored. But right now there's a lot of duplication between the GSObject classes and the node classes (more on that later; I think that's the big issue that needs to be resolved).

There's a lot of work to be done on this branch yet, but here's what's worth looking at so far:

galsim/config/machinery.py: base classes and syntactic sugar for nodes and fields. It's ugly, but most people (even most developers) shouldn't have to look here much. It's ugly so the other files can be prettier.
galsim/config/generators.py: a slightly higher level bit of machinery for handling config fields that can be set as constants, drawn from random distributions, or pulled from an input catalog.
galsim/config/definition.py: a start to defining something like the schema in config/galsim_default. This (along with that modified default config file itself) is probably the most useful stuff to look at, because most of the new code that would need to be written would look a lot like what's there.

Note that nothing in this branch presently works, and I have not consistently removed old configuration code (so far I have only done so where the names conflicted).

Overview

The config schema is defined by a set of classes with Field attributes, which are essentially fancy Python properties ("descriptors" is the official Python term). Unlike AttributeDict, which was completely flexible, this approach is quite rigid; the user can only set values that are defined by fields, and we can even include defaults and some requirements in the field definition. So far it just does type checking, though we could also do range checking and really anything else at this stage.
My approach to generating GSObjects from catalogs and configs centers around the GeneratorNode and GeneratableField classes and the apply methods on the GSObjectNode class and its derived classes. There's more info in docs in the code and the default config file. I think this approach makes this process more traditionally object-oriented, because each node that corresponds to a GSObject is responsible for being able to create one from a catalog row and random number generator. Here's how it would work:
1. Read (exec) the config file. Some kinds of configuration syntax errors would throw exceptions at this stage.
2. Call finish() on the root node with a UniformDeviate object and a dictionary of column name to column number. This sets up all the random number generators and catalog value extractors in any nested GeneratorNodes. This would also raise exceptions for invalid configuration options that can only be discovered when looking at the full tree.
3. Loop over the catalog, extracting rows (into tuples, lists, or NumPy arrays) and passing them to the apply() method of selected configuration nodes, which would return GSObjects to be turned into images. Moist apply() methods are implemented by using the makeDict function defined in generators.py to create a dictionary of keyword arguments for a node, given a row from the input catalog.

The Big Questions

What do people think of the above approach?
As I mentioned before, there's a LOT of duplication between the GSObject classes and my *Node classes in definition.py. In both cases, we're just enumerating all the different parameters that can be used to define a particular SBProfile, and we really don't want to do that twice. I don't think we want to try to unify the classes completely, but it would be very nice to be able to define a node class just by pointing it at a particular GSObject class and using introspection on it. That means we'd have to define the GSObject classes in such a way that their constructor arguments corresponded exactly to the allowable configuration parameters (which I think is desirable anyway), and we'd need to make those constructor arguments/parameters somehow introspectable, and maybe add more information like ranges of allowable values. That would probably involve adding something like the customized Field descriptors to the GSObject classes. And I think that pushes us towards a GSObject design that is based on holding its parameters and constructing an SBProfile when requested (this has been in the back of my mind when I suggested that approach in a few other contexts lately). If others agree, I'd be happy to take a stab at revamping GSObject; I don't think it's too hard, but it might be tricky to stage with all the other changes that would be happening in parallel. In fact, it might make sense to do that on a separate branch before continuing work on the #148 branch.

rmjarvis commented 12 years ago

It's pretty hard for me to gauge this yet. Could you maybe show what script 2 of MultiObjectDemo.py would look like using your version of config? That might give me a better idea of how your proposal plays out in practice. You don't need to implement everything yet -- just show how you think that would work for the use case we have there.

e.g. I don't see any support yet for a galaxy being a sum of two components, which we did there, but maybe I missed it.

Likewise, we will want to allow PSFs to be convolutions of several components (atmosphere * airydisk * optics * CCDdiffusion), so we need support for that too. (We didn't have that yet in our version, but it would have worked just like Sum.)

rmandelb commented 12 years ago

I second Mike's request. I realize there are other aspects to the problem besides "what does this look like in practice for a user?" but, while you've clearly articulated some of the other issues, I don't have a good feel for this one yet.

TallJimbo commented 12 years ago

I'll work on an example in the manner of script 2. Overall, I'm trying to change the actual config syntax very little (though it will change some). It's a much bigger change on the implementation side.

I'll ping everyone again when there's a little more in the way of example code; it's probably makes sense to focus on other big discussions and pull requests this week from a time management perspective anyhow

rmjarvis commented 12 years ago

Sounds good. Note that I'm hoping your new method will be nicer than what we have for dealing with Sum's. I don't really like the solution we came up with, using the items = [Config(), Config()] syntax.

One thing to remember when designing this is that both components in the sum might be the same type. e.g. bulge + disk with both Sersics, one with n=3.4, the other with n=1.2 (for example). So I think that means you need to define further parameters (that aren't specified along with the name) by their position in the list, not by their type name.

Hence our use of config.gal.items[0]... rather than config.gal.sersic..... Just something to keep in mind. Hopefully you'll think of a nicer way to do this.

barnabytprowe commented 12 years ago

Hi Jim,

Looks like a lot of thought has gone into this, thank you. The key to understanding what's going on at the top level seems to me to be in the Field class, so I'm looking there and will fire questions back as I get more into it (also looking forward to seeing a script 2 style example script whenever that's ready).

I agree with you on this point:

As I mentioned before, there's a LOT of duplication between the GSObject classes and my *Node classes in definition.py. In both cases, we're just enumerating all the different parameters that can be used to define a particular SBProfile, and we really don't want to do that twice. I don't think we want to try to unify the classes completely, but it would be very nice to be able to define a node class just by pointing it at a particular GSObject class and using introspection on it. That means we'd have to define the GSObject classes in such a way that their constructor arguments corresponded exactly to the allowable configuration parameters (which I think is desirable anyway), and we'd need to make those constructor arguments/parameters somehow introspectable, and maybe add more information like ranges of allowable values. That would probably involve adding something like the customized Field descriptors to the GSObject classes. And I think that pushes us towards a GSObject design that is based on holding its parameters and constructing an SBProfile when requested (this has been in the back of my mind when I suggested that approach in a few other contexts lately). If others agree, I'd be happy to take a stab at revamping GSObject; I don't think it's too hard, but it might be tricky to stage with all the other changes that would be happening in parallel. In fact, it might make sense to do that on a separate branch before continuing work on the #148 branch.

This was, of course, basically the problem that caused Mike and I to concoct alternative, neither ultimately satisfactory workarounds for parsing the config information, and I agree that a model for GSObject which stored its parameters as attributes would be the ultimately best solution. I'm sure the Python user might also sometimes find it handy to actually have those attributes to hand too. I also vote for doing this on a separate branch and pull request, for ease of documenting and compartmentalizing multiple changes.

barnabytprowe commented 12 years ago

Hi @TallJimbo ,

You mentioned that you were thinking of a design review call for this. I think the main agenda would be to discuss what needs to happen, to both the config implementation and perhaps the GSObject, and reassign much of this work to me + other volunteers if any.

I'm guessing Skype will be sufficient for this, do you have a time in mind?

TallJimbo commented 12 years ago

Thanks for reminding me, and I think that's a pretty fair summary of the agenda. Friday or early next week would work pretty well for me. My schedule is pretty clear as far as the details of scheduling.

rmandelb commented 12 years ago

Is this a time for others to join in and discuss the design? or is this the scheduled "brain dump"?

barnabytprowe commented 12 years ago

I was thinking that there might still be a couple of things to discuss conceptually, so the former. But I also hope that I can operate on a high enough level (and work from existing examples) for this to serve as the vast majority of the brain dump, too...!

TallJimbo commented 12 years ago

Yup, I was thinking this would be a conceptual discussion with more people. I want to show a few toy examples and get some opinions so Barney and I can figure out what to focus on.

rmandelb commented 12 years ago

okay, so should we make a poll?

On Jul 11, 2012, at 7:43 PM, Jim Bosch wrote:

Yup, I was thinking this would be a conceptual discussion with more people. I want to show a few toy examples and get some opinions so Barney and I can figure out what to focus on.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/148#issuecomment-6923253

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

TallJimbo commented 12 years ago

Yes, please! I can take a crack at it too, but I figure both of you have setup doodles much more recently than I have.

rmandelb commented 12 years ago

I'll do it.

On Jul 11, 2012, at 7:44 PM, Jim Bosch wrote:

Yes, please! I can take a crack at it too, but I figure both of you have setup doodles much more recently than I have.

Reply to this email directly or view it on GitHub: https://github.com/GalSim-developers/GalSim/issues/148#issuecomment-6923283

Rachel Mandelbaum http://www.astro.princeton.edu/~rmandelb rmandelb@astro.princeton.edu

rmandelb commented 12 years ago

http://www.doodle.com/vi8ar3nxy9vys6nv

Note: there are some times that are pretty early in the day on the poll. Choosing one of them on Friday would be hard on Barney (who is in California Friday but in Pittsburgh Mon/Tues), but I'm a lazy person who wanted to copy the same times for each day rather than entering new ones manually. Sorry Barney :p

rmandelb commented 12 years ago

(Also, Jim, I interpreted your "early next week" to mean Mon/Tues... was that correct? I can fix the poll if I misunderstood you.)

TallJimbo commented 12 years ago

Yup, Fri/Mon/Tues is good.

barnabytprowe commented 12 years ago

Hi all, just edited my preferences on the poll am afraid, need to pick up a car from the mechanic's tomorrow morning!

rmandelb commented 12 years ago

Anyone else want to join? (@rmjarvis ?) If not, then perhaps we could set a time for tomorrow.

rmjarvis commented 12 years ago

I filled in my availability on the doodle poll. Tomorrow's not great for me. But if you want to go without me, that's also fine. I'm just interested in listening in to the brain dump. I doubt my python prowess is high enough to give hugely useful advice.

rmandelb commented 12 years ago

I think it'd be good if you could join in, especially given the fact that you already put a lot of thought into config. So I'm inclined to prefer something like 11am Monday, rather than tomorrow, unless somebody else has some objection?

TallJimbo commented 12 years ago

Monday is slightly better for me too, actually. And 11am works for me.

barnabytprowe commented 12 years ago

If that works for everyone, it gets my vote!

rmandelb commented 12 years ago

Sounds like a plan.

barnabytprowe commented 12 years ago

Hi all,

On the basis of our informative discussion today, here are some brief action items:

Barney to learn the finer points of descriptors & properties, and attempt a straw man implementation of GSObject parameters as such.
Jim to perform a basic cleanup of unnec./legacy code remaining in definition.py and generators.py
Jim to look into adding dictionary support for what are currently FieldLists, and into nesting; although this latter will rely heavily on the new style Add and Convolve GSObject implementation being worked on by Barney.

Have I missed / misrepresented anything? Please speak up if so. I'll try and get 1. going ASAP. To this end, I might start work on another branch that can be merged into this one as required, but will contain only the GSbject changes to facilitate easier merging of that into the master prior to the final merge of #148.

Will be in touch soon...

barnabytprowe commented 12 years ago

Aha, I am reminded that an Issue already exists for this GSObject change, #195. Will open a branch in that name.

barnabytprowe commented 12 years ago

Hi @TallJimbo , I'm back from my hols and can give some more time to this stuff now. My first order of business will be finalizing the still-open Pull Requests, but perhaps today we might also usefully spend a little time on the phone to discuss my next tasks for getting this new config stuff working with the new GSObjects on #195.

Does that sound good? I can make almost any time EDT today, up until around 4pm EDT. Let me know!

barnabytprowe commented 12 years ago

Hi all (most specifically the primary interested /involved parties, @TallJimbo , @rmjarvis , @rmandelb but all others welcome to comment),

I'm posting a question on this issue because I think now would be a good time to think strategically.

Rachel and I recently enumerated a bunch of the things we really need to do for the release, and I've got a bit concerned. There is a bit to be done, much of it unchallenging but essential, and I need to make sure that enough time is allocated for that.

Given the almost-expiring milestone, and the fact that all of us have a lot of things to do elsewhere, I felt it was the time to be realistic and to put forward the following three proposals:

We pause config development again for a bit to and continue with the old way, and keep the object_param_dict updated. As Jim described very fairly in an email conversation he and I had today: "My gut is telling me that... the right approach isn't on the table yet. I think that means that the best approach for now is to take the past of least resistance for the foreseeable future."
We keep the new GSObject implementations, but make a minor modification to the current config stuff so that where the object_param_dict is currently called we instead introspect the required parameters directly from each class, much as I currently do in definitions.py. This allows us to ditch the object_param_dict, but doesn't bring in the full system that Jim is aiming at but is, I think fair to say, not fully defined in my mind. This could be implemented very quickly. However ditching the object_param_dict does come at great cost in terms of the added complexity (with some secondary nice features too) of the GSObject classes, and Mike is right to raise this as a serious concern.
We try and get this done as planned.

I would worry a great deal about 3: it would mean that simple things we know must be done might risk going undone, and the complexity of this task has grown and doesn't show signs of reducing soon.

I think Jim makes a fair point when he says (Jim I hope you are happy with me repeating this, it seems very sensible!):

Overall, I have to admit I'm not happy with the level of complexity this has grown to either, and I'm also a little bothered by some deeper issues in how GSObject and SBProfile work (both before and after your changes: see my recent GitHub comments on setting class, which really bothers me). And I continue to be skeptical of the premise that using our increasing-complex config system will be the mode of choice for users who know even a little Python: maybe our time would be better spent writing convenience functions, so users could write dead-simple scripts instead?

Also:

But I'd go a step further and discourage early users from using the config interface at all, as we know it's not stable right now, and just tell them to write their own scripts instead. Maybe seeing what scripts people write will tell us how to proceed in config-land better, and get our heads out the forest of descriptors and Python-fu.

I'd really welcome people's perspectives on this. The changes on #195 have brought a profound increase in the complexity of the GSObject classes. There is not yet, in truth, a fully coherent idea of where the config stuff on this branch will end up in a realistic timescale. Both Jim and I are very happy that we have learned a lot about the problem even if these branches stay unmerged. Neither of us are happy enough with how this is going that we feel it need be rushed.

rmjarvis commented 12 years ago

I'm also skeptical that people would prefer a complicated config scheme to writing python code. However, I do suspect that at least some people would prefer a simple config file to writing python. In fact, I think this point in particular argues against having the config file be a python execution. If you prefer that, you would probably prefer to just write a python script.

I have a proposal about what should change, but I think that discussion should be had on #148, so I'll post the rest of my thoughts there.

rmjarvis commented 12 years ago

Oops. This is #148. :) I thought I was on the pull request for #195. Oh well.

Anyway, David Kirkby made a good point at the LSST AHM suggesting that we not invent a new config scheme. Rather, we should use something already popular like YAML or JSON. I haven't used either one a whole lot myself, but I think I prefer YAML. Either way, we should have one or more executables that just read in such a config file and do something appropriate with them.

Also, I think the pathway to doing this is kind of the reverse of what's been happening so far. I think we should start by writing a config file that does something useful. We could probably write one for each of the demo scripts, but it would be fine to start with just MultiObjectDemo script 2, which is the one we currently use config for. Try to make the config file look the way we want for that. Then we write a program that reads in this file and does the same thing that the current script does.

Then any place that we find we need some extra ability on the part of the galsim back-end (introspection for example), we add it at that point. Currently, we're adding a bunch of python-fu without really knowing that it is going to be useful in the end. Or at least it's not obvious to me how useful it will be. Probably Jim has a better vision of it. But going this way will save us from work that isn't really needed.

Once we get the program parsing a single config file, we try to generalize it a bit to run the other scripts specified by different config files, but using the same executable program. I suspect we could end up with a single executable that is able to run all the different scripts by reading different config files. Or at least create the same output files. (E.g. we probably don't need to have a config file that would specify to run both single-processor and multi-processor versions of the same thing and then verify the outputs are identical, but we should be able to specify whether to use multiple processors or not.)

TallJimbo commented 12 years ago

The approach sounds fine with me. A couple of comments:

Unless there's a big difference in readability I'd recommend choosing which config scheme based on how mature and easy-to-install existing Python libraries are for them. Python itself appears to include a JSON parser, lots of XML parsers, and a module called ConfigParser (http://docs.python.org/library/configparser.html) that's more limited but probably a lot more familiar.
I think a lot of our problems in making config work come from trying to find a config-ish way to construct and manipulate our fairly complex GSObject classes. It might make sense to severely limit the things one can do through the config interface, especially at first, and focus on the very most common expected patterns.

rmjarvis commented 12 years ago

I added to this branch two executables that implement MOD script 2 for both JSON and YAML. They are called galsim_json and galsim_yaml. The relevant config files are examples/MOD2.json and examples/MOD2.yaml. I only had to make very minor changes to frontend.py and base.py to make them work.

From the examples directory, you can type:

../bin/galsim_json MOD2.json ../bin/galsim_yaml MOD2.yaml

The output files (in the directories output_json and output_yaml respectively) are identical to the ones created by "MultiObjectDemo 2" (in the directory output), so I think everything is working at least for this particular script. However, it should be noted that the executables don't do very much sanity checking of the input values, so there would still be a fair amount of work to be done to get this to something reasonable.

In any case, here are some pros and cons as I see them:

JSON: The config file is definitely not as pretty. The biggest down side is that comments aren't allowed. It's also trickier to make sure you have all the commas and everything in the right spots. The main pro is that it works out of the box (included with python)

YAML: Nicer looking config file. Has comments, syntax is more flexible. However, it adds a dependency, since PyYAML is not included with python out of the box. OTOH, installation is trivial. Also, it is a pretty small package, so it might be possible to bundle it with galsim.

barnabytprowe commented 12 years ago

@rmjarvis That's fantastic, thank you for those examples. I think the argument for using a well established, popular config scheme is very compelling: there will be some small section of our users who already know it, it avoids us needing to rewrite this from scratch; and we can have some confidence that well-established third party code for handling the parsing will be at least as robust and powerful as anything we'd be able to write ourselves in a realistic timescale, and probably a great deal more so.

I also see the logic in the reverse approach you're advocating: certainly the tack we've been on in the last month has not borne great fruit.

Unfortunately, I'm writing this from an Edinburgh Starbucks-equvialent, and won't have time to look into this in more detail, and get my head around it, until Sunday evening BST... But before then a couple of immediate comments:

That YAML supports comments is a compelling argument in its favour, and it is surely trivial to install or bundle (easy_install is pretty amazing). However, I still found the JSON config file relatively easy to comprehend due to the sensible names we've chosen for our objects and parameters (of course, these are familiar to me by now so I might not be the best judge), and it's great that it's included with Python 2.6+. I just don't have a strong preference.

Currently, we're adding a bunch of python-fu without really knowing that it is going to be useful in the end. Or at least it's not obvious to me how useful it will be. Probably Jim has a better vision of it. But going this way will save us from work that isn't really needed.

The place I see the Python-fu being useful is in the sanity checking that you mention the executable doesn't currently have. Using the descriptors, if they were kept, the config parsing in frontend.py could run through the parameter descriptors of each class it's asked to make and make sure it has every required parameter, at least one size, and zero or more of the optional params, without the need for a separately maintained object_param_dict or equivalent. The big question in my mind is whether the graceful failures available from such error checking are worth the order-of-magnitude increase in the introspect-able, smartly-updated parameter model of the GSObjects.

One compromise that we could bring in to ditch the object_param_dict without entering into the descriptor wonderland would be to go back to the old model but require each GSObject to contain a _param_dict or _params attribute which config would expect to be able to scan to get the same info. But I'm just thinking aloud.

Either way, I'm very interested in adopted a recognised config standard, and using off-the-peg parsing tools. Whether that's JSON or YAML is not of critical importance.

@joezuntz , @joergdietrich , @pmelchior , @PaulPrice , do you guys have any experience / opinions as regards YAML vs JSON?

barnabytprowe commented 12 years ago

(Or any thoughts overall?)

rmandelb commented 12 years ago

Here's my 2 cents (might be more like 1, as I'm not sure this is going to be very helpful despite being rather long):

1) I had been concerned about getting something working for the config system before releasing it within the GREAT3 collaboration for people to play with. However, one reason we're doing this release so early is to give ourselves ample time to get feedback and decide about any changes that might be necessary. With that in mind, I think it is quite fair to stick with something simple that works (either what we already have; or something that is based on a pre-existing system like JSON or YAML -- possibly even sticking with the cumbersome object_param_dict for now). That means we have something to start with, and can solicit feedback from people about how useful they find it. This "path of least resistance without burning bridges" approach makes the most sense to me given the other issues that I've been worrying about (_cough_lensingengine_cough_installationprocedurecough).

2) Re: Jim's comment about convenience functions, Mike and I have had differing opinions about these before, but I've come around more towards his way of thinking. It's all well and good to say "I want to make some convenience functions that allow users to do some version of test X", but when you think about the typical variations on that test, you end up requiring 20 keywords. At that point, the convenience function is not really so simple and convenient for the users to use or for the developers to write! I'd be more inclined to say we need to write a few more demo scripts that show off common use cases.

3) Re: Mike's comments on how to build up the config system, this approach seems quite sensible to me, but since I've been committing myself to work on other things instead of config, I will defer to those who will actually do work on this.

4) JSON vs. YMAL: I think the big distinction for me is the "no comments" vs. "having to install separately" issue. While it's a tough call, I think I'd go with "having to install separately" if it allows us to have comments. (Yes, the person who whines incessantly about the installation procedure is voting in favor of another dependency. Go figure.) That said, I was not actually able to run the examples, I just looked them over.

So... to sum up my rambling, I think I would vote for now to adopt one of these existing systems, figure out what minimal coding is necessary to allow us to use these systems to do the things in the demo scripts (I can think of a few examples offhand), go ahead and do it, and then not worry about the config system until we get feedback from a few users.

TallJimbo commented 12 years ago

I strongly think we need to have comments in config files. If people are using config files, we need to provide documentation on how to do that, and putting the documentation in example config files is both the easiest and the most effective way to documentation.

An extra dependency that's pure Python (i.e. not compiled) isn't nearly as big a deal as a dependency that does require compiling and linking. But i still think it might be worth looking in the ConfigParser module before going with YAML.

rmjarvis commented 12 years ago

As far as I can tell, ConfigParser is basically just for ini-type configuration files. i.e a list of keyword-value pairs organized by sections (e.g. .gitconfig). So there is no multiple-level hierarchy to it, which I pretty much think we require for the kinds of things we want to do with our config files.

So my recommendation is to go with YAML. The only question in my view is whether to make it a dependency or to bundle it with GalSim.

rmjarvis commented 12 years ago

I started making galsim_yaml.py a bit more general. In particular, it now can also run BasicDemo scrip 1, whose config file is examples/BD1.yaml.

I'll keep trying to add functionality to it to make config files that correspond to each of our existing demo scripts. I'll let you know when I hit a wall that requires something fundamentally new in frontend.py.

joergdietrich commented 12 years ago

I concur that comments in the config file is a killer argument. The examples Mike provided look very nice and clean. I personally think that with configs as complex as this I'd rather write a bit of python code, but this is obviously coming from somebody who is familiar with python. Other users of the code may indeed prefer a configuration file over learning how to do some easy coding, although the & and * syntax in YAML brings us dangerously close to C style syntax confusion IMHO. This is still a vote in favor of YAML, with which I have no prior experience.

barnabytprowe commented 12 years ago

Well, consensus seems to be growing in support of YAML, and I'm happy with that. Jim's right that a pure python dependency is much less of a big deal.

So... to sum up my rambling, I think I would vote for now to adopt one of these existing systems, figure out what minimal coding is necessary to allow us to use these systems to do the things in the demo scripts (I can think of a few examples offhand), go ahead and do it, and then not worry about the config system until we get feedback from a few users.

I think that sounds like a great plan! If it's YAML, which seems increasingly likely, and Mike is already taking a look, I'll focus instead on Mike's suggestion and our discussion on #228, the pull request for #195. Mike: if / when you do hit a wall in frontend.py let me know and I'll be happy to help... Not that you need the help, but it's always easier and nicer to take apart your own code rather than get frustrated with someone else's...

rmjarvis commented 12 years ago

FYI, I have yaml configuration files for all the BasicDemo scripts and the first two MultiObjectDemo scripts.

(I also switch the order of the first two MOD scripts, since I think it's better pedagogically that way for people who are reading through them to learn how GalSim works.)

There is also a shell script called examples/check_yaml that runs through all of these and checks that the output files are identical.

I did have to make some minor adjustments to how the demo scripts work in order to make it easier to have an identical output from the configuration version, but nothing really substantive.

Of course, there were some significant changes to frontend.py to make them work, so I apologize to Barney if you started doing any work on that file yet. If you have, I probably made your "git pull" merge pretty difficult.

Anyway, interested parties should feel free to look at the .yaml files and comment.

My take is that they are a lot easier for a newbie to look at and see how to get going on using GalSim than looking at the python scripts if they start out knowing neither python nor yaml. (In fact, it's probably easier even for someone who already knows python, but our learning curve for the galsim module is really pretty short, so it's not such a big advantage for them.)

dkirkby commented 12 years ago

Sorry I missed this discussion earlier but, for the record, although json does not support C-style comments that are ignored by the parser, you can certainly include comments as first-class data, similar to python's docstrings, e.g.

"#": "YAML configuration file for use with the executable galsim" "#": "This configuration file is designed to be equivalent to the example script called" "#": "Script2 in MultiObjectDemo.py" ... "input" : { "#": "In this case, we just have a catalog to read in.", "catalog" : { "dir" : "input", "file_name" : "galsim_default_input.asc" } }, ...

One advantage of this approach is that you can programmatically extract and reformat comments to automate config file documentation. Another possible advantage of json is that if you ever want a web front end, then json is probably the easiest format to interface with javascript.

David

rmjarvis commented 12 years ago

That's a clever idea, David. I hadn't come across that trick in my web searches about comments in json files.

In practice, the back end is completely independent of which style config file we use. All we need to be able to do is to read in a python dict. Then everything proceeds from that.

We could even have the reader support multiple config formats by checking the extension or having the format be specified on the command line or something. That wouldn't be a very hard change.

I just happen to like the yaml format better, since it omits a lot of the extra punctuation that is in the json format, so I find them to be a lot more readable. But if there are good reasons to switch, most of the work on this is still relevant. We'd just have to switch the config files themselves and change a few lines in the executable.

dkirkby commented 12 years ago

Yaml certainly looks more readable so I would go with that if the installation burden is reasonable. Btw, yaml 1.2 is a strict superset of json but it looks like PyYaml does not support it yet (even though 1.2 is 3 years old now).

David

On Tue, Aug 28, 2012 at 12:25 PM, Mike Jarvis notifications@github.comwrote:

That's a clever idea, David. I hadn't come across that trick in my web searches about comments in json files.

In practice, the back end is completely independent of which style config file we use. All we need to be able to do is to read in a python dict. Then everything proceeds from that.

We could even have the reader support multiple config formats by checking the extension or having the format be specified on the command line or something. That wouldn't be a very hard change.

I just happen to like the yaml format better, since it omits a lot of the extra punctuation that is in the json format, so I find them to be a lot more readable. But if there are good reasons to switch, most of the work on this is still relevant. We'd just have to switch the config files themselves and change a few lines in the executable.

— Reply to this email directly or view it on GitHubhttps://github.com/GalSim-developers/GalSim/issues/148#issuecomment-8103865.

rmandelb commented 12 years ago

I can't run the yaml scripts. I suspect it's something basic but I'm not sure what... e.g., if I am in examples/ and I do python ../bin/galsim_yaml.py BD1.yaml

then the result is

Traceback (most recent call last):
  File "../bin/galsim_yaml.py", line 10, in <module>
    import galsim
  File "/Users/rmandelb/great3/GalSim/galsim/__init__.py", line 13, in <module>
    from . import config
  File "/Users/rmandelb/great3/GalSim/galsim/config/__init__.py", line 3, in <module>
    from . import definition
  File "/Users/rmandelb/great3/GalSim/galsim/config/definition.py", line 171, in <module>
    class PostageStampRootNode(machinery.NodeBase):
  File "/Users/rmandelb/great3/GalSim/galsim/config/definition.py", line 182, in PostageStampRootNode
    class psf(machinery.ListNode):
  File "/Users/rmandelb/great3/GalSim/galsim/config/definition.py", line 183, in psf
    types = (MoffatNode, PixelNode)
NameError: name 'MoffatNode' is not defined

rmjarvis commented 12 years ago

I was getting similar errors when I ran scons tests, but not when running this program. Weird.

This error comes from the fact that some of the config machinery isn't finished yet on this branch. Probably makes it a poor choice for me to have worked on my stuff in this branch, but oh well. Anyway, I just commented out the import config line in __init__.py and then scons tests works again, except for two errors in the test script that are explicitly about the new config code. So hopefully that will make your yaml test runs work now.

rmandelb commented 12 years ago

Thanks, Mike, it works now.

rmjarvis commented 12 years ago

I finished getting the last demo script (MOD 4) working as a yaml config file. That was the one that made pairs of images of the same object drawn using fft and phot, so it's a pretty contrived example in a couple places. This isn't the kind of thing anyone is ever going to do. But it works. It entailed a new List type which is probably a useful feature to have anyway (e.g. you might want to randomly select from a finite list of choices for the galaxy profile).

So please take a look and see what you think. I've only really added the functionality that was necessary to get these particular scripts working, so I know there are more features we will want to be able to be specify in config files (e.g. multiple output files), but they aren't enabled yet.

Also, the point of this issue is "Mechanisms for more restricted config files". Now that I've enabled at least most of the functionality we'll need, it should be a bit easier to try to determine how much we want to restrict. Right now, I raise exceptions whenever a required item is missing, but I think one of the ideas was to catch when people specify an illegal item, since it is likely a typo. Currently, that item just gets ignored. So that would be a useful addition if we can think of a nice way to do so.

Jim started implementing his idea about how to do this in the galsim/config directory, but I'm not sure how far along it is or how well it will work with the additional features I've added to the config stuff.

rmandelb commented 12 years ago

I like the List type. I could think of other ways I might use that. Also, I see that you updated the options for making shears from the original E1E2 and G1G2 to include a broader variety of options, EBeta, GBeta, QBeta - that was on my wish-list but I guess you already needed it for the demos.

So, about making more restricted config files: I agree that it would be nice to raise an exception when people specify an illegal item, so I do want to hear from @TallJimbo about the feasibility of implementing his scheme given the new versions of our base classes. But I also wonder if we could in some way rely on the python exception-handling here, e.g., if someone is specifying a galaxy and has

   type : Gaussian
   fwhm : 1.0
   flix : 1.0

then could we make sure that the additional mis-typed keyword "flix" gets passed to the Gaussian constructor, which will complain about the unexpected keyword argument?

If it seems unnecessarily complicated to make the more restricted config now, then we might want to consider postponing this work until after we release the code within the collaboration.

Also, I think it's worth having a discussion of the other features we want to specify with config, so we can decide what's worth spending time on now vs. waiting for later -- especially given the fact that we have this looming deadline when we want to encourage people in GREAT3 to start playing with the code before the working meeting. Aside from the multiple output files, what else is there? Offhand, I thought of drawing parameter values randomly from particular distributions other than the cases that are implemented already (looks like we can draw random uniform or Gaussian deviates only - right?). Also, what about non-image output, e.g., if someone uses a script to draw a bunch of galaxies that have some random parameters (position angles or whatever) and then wants an output file, either ASCII or FITS table, that gives the parameters that were used for each galaxy?

Mike, are you still planning to work on the documentation in frontend.py? I noticed a few comments that seem to be defunct, but if you're still actively updating then I won't worry about it for now. We also need information about getting PyYAML to go into INSTALL. Perhaps this could be an optional dependency, since those who want to use only python to interact with GalSim don't need it.

I have a question about the RandomTopHat option. My assumption when I saw that was that it was the product of two 1d top-hats, i.e., uniform within some defined range in +/-x and +/-y, but actually it's uniform within some radius. It seems like there might be uses for both options?

rmjarvis commented 12 years ago

Thanks for the comments. Some responses:

Extra outputs aside from the image files would definitely be useful (e.g. the truth table). I'd add this as output.catalog, similar to input.catalog.
At one point I also thought of having the list of images returned internally as a python list of images. This could be an output.type = Array or Internal.
If you see typos, feel free to fix. Thank you. I wasn't planning on working on this more in the next few days.
I'm fine with changing the name of RandomTopHat, since it's apparently confusing. Maybe RandomCircle or RandomCircularTopHat (... other ideas?).
We don't need a separate name for the 2 separate random values, since you would just use DXDY with each one being type=Random.

GalSim-developers / GalSim

Mechanisms for more restricted config files #148