ai2cm / fv3config

Manipulate FV3GFS run directories
Apache License 2.0
1 stars 0 forks source link

Add check_config step in write_run_directory #6

Open oliverwm1 opened 4 years ago

oliverwm1 commented 4 years ago

config.write_run_directory should check that the config dict provided is valid (to the extent possible). Based on the configuration flags set, it should be able to determine what initial conditions files are needed, and check for the existence of those in the specified initial conditions directory. This would help ensure that user-supplied configurations are still valid.

In theory we could also check for invalid namelist options, but this would be a lot of work to implement, because it involves going through all the source code to pull out default namelists.

mcgibbon commented 4 years ago

As suggested by @brianhenn in #2, we should also check that any user-supplied directories have the required files.

mcgibbon commented 4 years ago

To ensure no ambiguity between "options" and "paths", we should require that directory options are either a valid option or an absolute path (not a relative path). We can still write command-line interfaces that allow relative paths, they just need to convert to an absolute path when they take input (which insulates them against any changing of working directories that might happen in the code).

nbren12 commented 4 years ago

I think directories and or default configurations should be distinguished either by different function arguments or some other way. They are different concepts, and should probably have different representations in the code beyond some fancy argument checking. In general, it is confusing to modify behavior based on the value rather than type of an argument. Otherwise, you might have to write increasingly arcane logic to distinguish strings that are paths from strings that aren't. What if I want to name my default configuration with / in it?

mcgibbon commented 4 years ago

The string is an absolute path if os.path.isabs(string) is True. It isn't arcane logic.

Why can't we just decide not to include slashes in our option names? I think "what forcing data I want to use" is the underlying concept of the configuration option, and both an option name or folder path fall under that concept. I think it's much more confusing to have "what forcing data I want to use" specified in multiple configuration parameters.

oliverwm1 commented 4 years ago

I think it's much more confusing to have "what forcing data I want to use" specified in multiple configuration parameters.

Yes, to me this is a key point, as I also mentioned in my reply to your comment, Noah. Do you have a specific suggestion for how we could distinguish built-in versus user-supplied options with just one argument? We could use ints to specify built-in options (0, 1, 2...) and assume strings are user-supplied paths. But this makes it less clear what the built-in option is.

nbren12 commented 4 years ago

Maybe I am not expressing myself correctly. I don't think the method for distinguishing this behavior should involve any sort of if statements involving the value of the argument since that can lead to unexpected behavior. Basically, if the docstring starts to get long and have multiple if-then statements, then something is probably wrong with the design.

nbren12 commented 4 years ago

If it weren't a top-level API, I would be in favor of a more explicit syntax like this:

config['forcing'] = get_forcing_data_by_tag(name)

But I do recognize that you want to make it easy to run some default cases.

nbren12 commented 4 years ago

Or if you don't like that explicit function. You could make a class hierarchy:

class DefaultForcingData:
    def get_config_entry(self):
        pass
class ExplicitPathForcingData:
    def get_config_entry(self):
        pass

And then call forcing.get_config_entry() whenever you actually need to coerce config object into a dict of simple datatypes.

mcgibbon commented 4 years ago

Getting back to the issue itself, we also need to add checks that if the restart example initial conditions data is set, then at the least the namelist is set to read restart data. Perhaps this could be generalized by checking the files in the initial conditions directory and ensuring what files are present is compatible with the namelist options.

If we want to discuss the design for the config dict further we should open an issue (but I think we settled this offline already).

oliverwm1 commented 4 years ago

Perhaps this could be generalized by checking the files in the initial conditions directory and ensuring what files are present is compatible with the namelist options.

That's basically what I wrote when I opened this issue :)