beetbox / confuse

painless YAML config files for Python
https://pypi.org/project/confuse/
MIT License
417 stars 51 forks source link

de/serialization and validation #13

Open thejohnfreeman opened 9 years ago

thejohnfreeman commented 9 years ago

Each type needs a deserializer. Most types have a constructor (e.g. int, float, bool), and that is enough. Clients should be able to provide alternatives, though, because a deserializer also encompasses validation.

Some clients may want to serialize configurations. Most types have a __str__ method, and that is enough. Clients should be able to provide alternatives, though, in case __str__ does not exist or match the deserializer.

I saw the issue on Confit templates, and the idea is very similar. Going back to my assumptions (#7), though, I think it needs to be mandatory, not optional. A simple dict will be enough to define both types and defaults, however, so it should not be burdensome.

sampsyo commented 9 years ago

Indeed. In Confit, you can write view.get(int) to ensure that you get an int back, but you can also write view.get() to get whatever unsanitized data the YAML gives us. (I don't have a particular attachment to that unsanitized mode; it's definitely a hack.) Here, int is just a convenient alias for a full-fledged template class that deserializes integers—a little overkill for that case, but that's how deserialization is optionally made client-specific.

The neat thing about Confit currently is that the view system lets you be flexible about when you do validation for what. You can of course validate an entire configuration at once, but if you have a very large application, you can modularize the validation. For example, if you have two variables, foo and bar, you can either:

validated = config.get({'foo': int, 'bar': str})
print(validated['foo'], validated['bar'])

or:

print(validated['foo'].get(int))
print(validated['bar'].get(str))

On serialization: Would a goal be to re-emit a valid configuration file that could feasibly be parsed again? This could be useful, for example, if an application wants to be able to programmatically update the configuration file.

thejohnfreeman commented 9 years ago

That is exactly the goal of serialization. It especially comes in handy for debugging and auditing, where we want to keep around the exact configuration a previous run used. Doing "round-trip" editing (rewriting the user's configuration file with updates) is tricky, because we'll want to preserve comments and whitespace as much as possible. There are some libraries that try to implement that for us, though, like ruamel.yaml.

sampsyo commented 9 years ago

Yeah, a full round-trip would be tricky. FWIW, Confit currently tries to get halfway there—it includes some heuristics for interspersing comments back into the formatted YAML. ruamel.yaml looks much more complete.

A simpler format like TOML might be easier to round-trip.

thejohnfreeman commented 9 years ago

I hadn't heard of TOML before (though I've heard of the author before), but in our model it will be incredibly easy to define new input formats as functions that emit a configuration.