chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.76k stars 414 forks source link

Support config files for modeling #7186

Open buddha314 opened 6 years ago

buddha314 commented 6 years ago

Python can read text configuration files easily. This is great for keeping defaults for models, as in

r=0.25
y = (1,0,1,0)
kernel=radial

These parameters can get quite annoying to manage, and text files are very handy. Chapel could have convenience methods, with an interface kinda like

var r real,
      s: [] real,
     kernel: string;

r, s, kernel = read_config('../models/model1.toml');

Along with type checking.

ben-albrecht commented 6 years ago

This idea seems related to the existing execution configuration files.

e.g.

// main.chpl
config const r: real,
             kernel: string;

writeln(r);
writeln(kernel);
# main.input - configuration file
r=10
kernel="linear"
> chpl main.chpl
> ./main -f main.input
10
linear

One difference is that the file is specified at the command line rather than in-code.

buddha314 commented 6 years ago

That smells like "easy" to me, then.

ben-albrecht commented 6 years ago

Does this feature request boil down to "support specification of a default execution configuration file in code" then?

Note that providing default values for the configs in the code is equivalent:

// main.chpl
config const r: real = 10,
             kernel: string = "linear";

writeln(r);
writeln(kernel);
buddha314 commented 6 years ago

That's the wordy version. What? Are you Nathanial Hawthorne? In your snippet, you are not reading from an external file, that's the juicy bit. That way i can edit my parameters in that file and in the code do

alpha, beta, gamma, epochs = readAllTheThingsFromAFile('params.txt');

for e in 1..epochs {
  writeln("Happiness is as a butterfly which, when pursued, is always beyond our grasp, but which if you will sit down quietly, may alight upon you.");
}
bradcray commented 6 years ago

I think Ben's pointer to the existing support for specifying configuration files is probably the best answer to this request.

Brian's desire to write something like:

var (alpha, beta, gamma, epochs) = readAllTheThingsFromAFile('params.txt');

is made challenging by the fact that Chapel is a statically typed language. Specifically, the "readAllTheThingsFromAFile() would necessarily have to return a specified number of things of specific types in order to pass compilation. If you're willing to pin those things in your program, this can be written:

proc readAllTheThingsFromAFile(filename: string) {
  var infile = open(filename, iomode.r).reader();
  return (infile.read(real), infile.read(real), infile.read(real), infile.read(int));
}

var (alpha, beta, gamma, epochs) = readAllTheThingsFromAFile("params.txt");

writeln((alpha, beta, gamma, epochs));

But there's no way for us to provide a "readAllTheThingsFromAFile()" method because it can't anticipate what types you're going to read from the file at compile-time (which would be necessary to establish the types of alpha, beta, gamma, and epochs).

I usually approach this using a pattern like the following:

config const paramFilename = "params.txt";

var paramfile = open(paramFilename, iomode.r).reader();

var alpha = paramfile.read(real),
    beta  = paramfile.read(real),
    gamma = paramfile.read(real),
    epochs = paramfile.read(int);

writeln((alpha, beta, gamma, epochs));

That is, I make a configurable filename, and then use the type-based read methods on channels to read in a value of each type that I want to consume from the file.

All that said, I still think that the built-in support for configuration files that Ben referred to above is probably the nicest way to handle such cases.

cassella commented 6 years ago

One advantage of a read_config()-style approach is that it can be invoked more than one time in a program,

for model in 1..9 {
  read_config("../models/model" + model + ".toml");
  run_model();
}

Perhaps the Chapel config machinery could be (re-)invoked at runtime like this somehow?

A la Reflection, a proc read_config(filename) could know all the config vars and their types, so it could process the file like the usual configfile handling does?

Caveat 1: what should it do for

config var a = 5, b = 5 * a;

when the configfile specifies only a?

Caveat 2: Brian's example calls for a config var that's an array or tuple. Is support planned for that via the commandline or configfile interfaces?

Caveat 3: This can't reset a config const.

Maybe for code maintainability, you'd want to be able to limit which config vars a given read_config() could modify? E.g., so that reading the model config file can't change any distribution options.

Or maybe a run-time config reader shouldn't be targeting the global vars anyway? Maybe instead (or additionally) there'd be a way to read_config restricted to config vars that are (waves hands) in the same scope,

config var x: int;

...

for i in 1..10 {
   config var a = 5, b = 5 * a;
   config const c: int;

   read_config(file i);

}

And that read_config could set a, b, and the const c, but be unable to set x?

Ideally the syntax would allow for read_config() to be doing the initialization of the vars, not just assigning to them after their initialization as the syntax above suggests. Hmm,

for i in 1..10 with (configfile="file"+i+".toml",
                     config a = 5, b = 5 * a, config const c: int) {
}

Passing a named argument to the with seems grotesque. I just can't think of another way to connect this config intent to a file for each loop iteration.

And getting this behavior via with restricts this configfile reading to places a with can be put. (e.g., if more work were needed within the loop body to construct the filename.)

bradcray commented 6 years ago

It seems pretty weird/scary to me to have a built-in read_config() function that will re-assign an arbitrary set of configs in a user's program without any indication of that (i.e., the configs aren't passed in and aren't assigned on the way out. I'm also not certain what problem we're trying to work around anymore... Chapel's file I/O isn't so bad that one couldn't write their own readconfig() with a minimal amount of effort...?

buddha314 commented 6 years ago

If I'm reading this correctly, it seems you're worried about type enforcement. I don't understand why

runThatBaby --f configFile.cfg

is different from a type enforcement perspective than

modules RunThatBaby {
  config const babiesToRun int;
  readConfig('configFile.cfg');
}

Or am I not understanding your concern?

My use case is primarily around connections to external processes and model parameters. For instance, having your database credentials in an external file means every user can run tests as long as there is a "dbcreds.cfg" file present. Also, for testing you want to run a model through 5 iterations, but in production 5,000. It's really easy to edit this in model.cfg.

CAN all of this be done with a command line argument? Yes. It's just less convenient and breaks some patterns I'm seeing in the application space.

cassella commented 6 years ago

I'm also not sure what @buddha314's use case is that the generated executable's -f params.txt that @ben-albrecht suggested doesn't satisfy. Reading a different file each pass through a loop was just the first thing I thought of. Another could be the program deciding on a config file dynamically.

(The -f option is available only on master, not 1.15.)

But it seems like a productivity win if the user doesn't have to supply even that minimal effort. Especially for prototyping. And especially for one-off runs. For example, in the local-scope variant below, a read_config() style approach would let you add a config var debug = false, and tweak just one of a dozen config files to set debug = true without having to modify the rest to set debug = false.

I agree the ability of an innocuous-looking function call to modify arbitrary global config vars in the "global config vars" case, or the even stranger semantics in the "same scope" case, is surprising and weird.

I didn't spell out what I was thinking about the "limit which config vars" bit. The best I can think of is to pass their names as strings,

read_config("a", "b", "c");

or (more weirdly) make read_config() varargs but take all its args as ref or out, if the Reflection support would let it then identify the corresponding config vars and their names and types. Or something. Then write the slightly more Chapel-looking

read_config(a, b, c);

Another wrinkle: if the config file doesn't specify a value for one of the variables allowed to it, should read_config() leave that variable unchanged, or cause it to be reset according to the var's declaration (config var x = 5)?

The motivation for the with approach was actually to be less surprising in this way, although that's also not much like existing Chapel code. And also to allow handling a config const without being syntactically surprising.

Another thing that occurred to me about the "restricts to places a with can go" is that the with variant seems to want to be some kind of modifier on the {, not on the for. But that's also not very Chapelish.

I'm not really happy with any of the syntaxes I've described though, to a large extent for the same reason.

Here's another approach for the local-scope approach that makes it more obviously not-a-function-call.

config var something = 7;
proc thisisnew(i) {
     config with "cfgfile" + i;    // clearly not a function call
     config var somethingelse = 8;
}

Then that config file could be used to set somethingelse, but not something. (This takes advantage of Chapel currently requiring config vars to be at module scope. Thus any config var that's not at module scope is fair game. :)

It could apply also to all lexically enclosed scopes that don't have their own config with.

It would be an error to have a config var in not-module-scope that doesn't have a config with in it's scope or an enclosing scope.

(Alternates: config with(filename), with config filename, just config filename.)

This has a problem if you want the use of the config file to be conditional, since the conditional could put that use into a small scope,

proc doh() {
     if (condition) {
        var filename = construct_filename();
        config with filename;
     }
     config var cannot_modify = 7;
}

It does seem a little limiting to tie this to the same flat-text format that the -f option uses.

Perhaps -f and read_config() should also have a file-format layer, to be able to consume e.g. yaml, json, toml, user-specified xml schema, as well as plain text? This would make the Chapel application more composable with existing workflows. That part sounds like it would want to have a hook to get that part of the functionality from other modules.

I'm not attached to this, but I'm having fun exploring the concept.

bradcray commented 6 years ago

If I'm reading this correctly, it seems you're worried about type enforcement

No, it's not that. It's that, inspecting the source, there's no reason to think that a function called read_config() would set variables that aren't passed to it or assigned via it. The effects seem too subtle to me and too disconnected from the declaration point of the variable. Or, as @cassella puts it:

I agree the ability of an innocuous-looking function call to modify arbitrary global config vars in the "global config vars" case, or the even stranger semantics in the "same scope" case, is surprising and weird.

(The -f option is available only on master, not 1.15.)

This isn't accurate, -f has been available since the dawn of Chapel time.

Of the approaches being kicked around, the one I like best is a variation on Paul's in which such use cases relied on a routine like:

var a: real;
var b: int;
var c: string;
readVars("mySettings.txt", a, b, c);

but in which the variables need not be config declarations (because I don't think there's anything about reading a variable from a file that should require it to be a config...). However, our current reflection capabilities are not sufficient to be able to reason about the names of the variables a, b, and c, so for this to work, you'd need a file with the following format:

3.14
42
hiya

rather than:

a=3.14
b=42
c=hiya

If this were sufficient, then I believe one could write such a routine today. Since it's feature freeze day for the 1.16 release, I'll leave this as an exercise for the reader, though here's a starting point that demonstrates the basic pattern without using a file:

proc foo(ref a...) {
  for param i in 1..a.size do
    read(a(i));
}

var a: int;
var b: real;
var c: string;

foo(a, b, c);
writeln((a,b,c));

One theme in this thread that I'm dead-set against is the slight insinuation that if a read_config() is found somewhere in a user's file, somehow it would squash the initialization of the config var or const at its point of declaration. That's getting into intractable compiler problem territory (not to mention some hairy lifetime analysis to ensure that nobody tries to access the variable between its declaration point and corresponding read_config() call)—essentially, the compiler would have to know that read_config() was called (equivalent to the halting problem) and then determine whether or not the filename that was dynamically generated contained an assignment to the variable in question (which is impossible since the filenames are being generated dynamically and may change between compile-time and runtime anyway).

buddha314 commented 6 years ago

I'll leave this as an exercise for the reader

Thanks for bringing back grad school nightmares...

bradcray commented 6 years ago

Just get your lab partner, @cassella, to do all the work!

buddha314 commented 6 years ago

I did 2.5 years of chemistry until my last lab partner sent flaming liquid dribbling down the bench towards me. Things on fire in organic chem lab is no joke!

I may try your assignment, Mr. Cray.

cassella commented 6 years ago

This isn't accurate, -f has been available since the dawn of Chapel time.

Sorry about that. I remembered a commit about it a few months ago. I looked in the 1.15 version of usingchapel/executing.rst and didn't see it. So I thought it was the feature itself that had been added.

buddha314 commented 6 years ago

Here is another use case for this. Now that we have the CDO Library an external file is a good place to store database credentials, especially ones with local passwords.

buddha314 commented 6 years ago

One subtlety I forgot about here was the ability to have readConfig() function. Things do work fine if you do -f myConf.cfg but the use case I was thinking about earlier was to have

database_dev.cfg

DB_USER=Cooper

And Chapel function.

config const dbConfig: string = "local.db";

proc main() {
  readConfig(dbConfig);
  writeln("DB_USER: ", DB_USER);
}

Then be able to run

./mydbprog -dbConfig=prod_creds.cfg  // OR
./mydbprog // Uses default file
bradcray commented 6 years ago

One subtlety I forgot about here was the ability to have readConfig() function.

I weighed in on this general concept above and my thinking remains unchanged. To summarize, I think once variables' values are being read from files at arbitrary program execution points, I don't think the behavior should be config-related anymore. I think at that point you're just reading some variables from files. To me, configs are all about symbols that enjoy the ability to be established on the command-line easily. Once those values are set at arbitrary runtime points, it no longer makes sense for constants (their values have to be specified at declaration-time), and is no longer config-specific (any variable could read its value from a file). That's what led me to suggest more of a general "read variables" function in the linked comment.