LukasKalbertodt / confique

Type-safe, layered, light-weight, `serde`-based configuration library
https://docs.rs/confique
Apache License 2.0
169 stars 10 forks source link

Optional nested objects and lists/maps of nested objects? #12

Open LukasKalbertodt opened 2 years ago

LukasKalbertodt commented 2 years ago

It might be useful to treat nested configuration more like normal values? Making them optional or putting them in lists/maps. But there are lots of open questions and I have to reevaluate whether this requirement still makes sense now that we can have maps and arrays as default values.

reivilibre commented 12 months ago

Maps of nested objects seem like they would be particularly useful!

In my use case I have a config structure with this kind of thing (TOML):

[oidc.clients.key1]
name = "abc"
secret = "123"

[oidc.clients.key2]
name = "abc"
secret = "123"

I'd like to let users load the secret from a separate file if they want, so that they can keep their secret configuration separate from the less-sensitive configuration (e.g. because the secret configuration might be stored encrypted in their git repository that they use to set up the service).

Because maps of nested objects aren't supported, I believe the whole map has to appear in one particular file since it only uses serde's Deserialize implementation, so currently the above would need some workaround...

LukasKalbertodt commented 8 months ago

I tried to tackle this issue a couple of times already but always kind of got stuck. What follows is a bunch of thoughts, mostly as a note to self, but of course I'm always happy for more input and ideas.


Why even?

Why not just have a struct and derive Serialize for it, instead of Config? I think there are only two main things:

@reivilibre provided an example above (thanks!) with the goal of merging objects from different sources. But one can also imagine each client having a bunch of fields and the documentation aspect being important. Some examples how the template for optional objects or lists or maps could look:

Template examples/brainstorming

Optional nested object

#[derive(Config)]
struct PopupConfig {
    /// The title of the popup.
    title: String,

    /// The body text of the popup, basic markdown is supported.
    body: String,

    /// Label for the button that makes the annoying popup go away.
    #[config(default = "Yeah whatever")]
    label: String,
}

#[derive(Config)]
struct Conf {
    /// Allows you to configure an annoying popup on this website.
    #[config(nested)]
    popup: Option<PopupConfig>,
}
# Allows you to configure an annoying popup on this website.
#
# This object is optional.
#[popup]
# The title of the popup.
#
# Required!
#title =

# The body text of the popup, basic markdown is supported.
#
# Required!
#body = 

# Label for the button that makes the annoying popup go away.
#
# Default value: "Yeah whatever"
#label = "Yeah whatever"
# Allows you to configure an annoying popup on this website.
#
# This object is optional.
#popup:
    # The title of the popup.
    #
    # Required!
    #title:

    # The body text of the popup, basic markdown is supported.
    #
    # Required!
    #body: 

    # Label for the button that makes the annoying popup go away.
    #
    # Default value: Yeah whatever
    #label: Yeah whatever

Or do we want to add double # (comment token) to all comment lines inside the object so that one can easily uncomment the whole block in one go (with some editor/IDE function)?

List of objects

#[derive(Config)]
struct QuicklinkConfig {
    /// Label of the link.
    label: String,
    /// Target of the link.
    href: String,
}

#[derive(Config)]
struct Conf {
    /// Links shown in the footer.
    #[config(nested)]
    quicklinks: Vec<QuicklinkConfig>,
}
# Links shown in the footer.
#
# Required!
#[[quicklinks]]
# Label of the link.
#
# Required!
#label =

# Target of the link.
#
# Required!
#href = 

The other array syntax in TOML might be more preferable, but it's unclear how to best emit the field docs then.

# Links shown in the footer.
#
# Each object in the list has the following fields:
# - label: Label of the link (required)
# - href: Target of the link (required)
#
# Required!
#quicklinks = []

Both TOML examples are not great IMO. The first one is confusing because of TOMLs [[]] syntax and it's not clear that quicklinks = [] is a valid option there as well. The bottom example is not great because it doesn't show how you would write an object in that syntax. Also, letting the user chose between those requires another attribute.

YAML is a bit better here I think:

# Links shown in the footer.
#
# Required.
quicklinks:
    # Label of the link.
    #
    # Required.
    # - label:

    # Target of the link.
    #
    # Required.
    #  href:

But I can't come up with a really nice way to incooperate the field docs here and how to indent them.

Something similar can be observed with a map of nested objects. But there, the example key probably needs to be specified by the user.

So I think my takeaway is that there isn't really a perfect way. Oftentimes the user wants/needs more control over how the template is formatted, which increases complexity. Just letting the user manually write a doc comment with examples, as is the current workaround for half of this issue, might not be too bad after all. Of course, so far I have only looked at small examples. If someone has a good use case where the nested object (in option, list, map) is quite large, please let me know! Because at some point manually writing the example and field descriptions will be a problem.

Random thoughts

Random thought: letting the user write simple Mustache-template-like things inside the doc comment, giving it access to docs of subfields?

Random thought 2: to solve the "you can always write manually, but it gets problematic with multiple formats", confique could filter out fenced code blocks of the wrong format? And convert fenced code blocks into indented ones?

#[derive(Config)]
struct Conf {
    /// Cool example:
    ///
    /// ```toml
    /// foo = "heyyy"
    /// ```
    /// ```yaml
    /// foo: heyyy
    /// ```
    foo: String,
}

Would result in in these, depending on the format.

# Cool example:
#
#     foo = "heyyy"
# 
# Required!
#foo = 
# Cool example:
#
#     foo: heyyy
# 
# Required!
#foo:

Merge modes

It might also be a neat idea to add a attribute merge_mode that can be used on non-nested fields. The current merge mode would be called "none" for example, meaning that the first value would be taken. But arrays and maps there could be different merge modes. Or one could let users specify a custom function responsible for merging.

That would solve the "merge" part of this issue.


Sorry for the extremly unstructured comment, but I needed to dump these thoughts as otherwise I start at 0 again next time I try tackling this.

LukasKalbertodt commented 8 months ago

Oh also: env would be tricky on these kinds of nested objects. The derive on the struct does not know how it is used and would accept an env = "FOO" attribute. But if it's used as a list... what then?

jreniel commented 5 months ago

Hello! I am in this same boat. I've tested many crates for config management, and I really like the derive style of confique. However, all of the crates seem to fall short when it comes to complex and dynamic data structured. In my case, I have:

mesh_bbox: &mesh_bbox                                                                    
    xmin: -98.00556                                                                      
    ymin: 8.534422                                                                       
    xmax: -60.040005                                                                     
    ymax: 45.831431                                                                      
    crs: 'epsg:4326'                                                                     

rasters:                                                                                 
  - &GEBCO_2021_sub_ice_topo_1                                                           
    path: 'GEBCO_2021_sub_ice_topo/gebco_2021_sub_ice_topo_n90.0_s0.0_w-180.0_e-90.0.tif'
    bbox: *mesh_bbox                                                                     
    chunk_size: 1500                                                                     
  - &GEBCO_2021_sub_ice_topo_2                                                           
    path: 'GEBCO_2021_sub_ice_topo/gebco_2021_sub_ice_topo_n90.0_s0.0_w-90.0_e0.0.tif'   
    bbox: *mesh_bbox                                                                     
    chunk_size: 1500                                                                     

geom: &geom                                                                              
    zmax: &zmax 10.                                                                      
    rasters:                                                                             
      - <<: *GEBCO_2021_sub_ice_topo_1                                                   
        zmax: 0.                                                                         
        overlap: 5                                                                       
      - <<: *GEBCO_2021_sub_ice_topo_2                                                   
        zmax: 0.                                                                         
        overlap: 5                                                                       
    sieve: true                                                                          

The objective is that the same yaml file that works for Python continues to work in Rust. The above yaml file is validated in the Python version using pydantic.

To give you more context, my application will process separate keys, for example, the geom_build binary will naturally process the geom key, whereas the hfun_build binary will process the hfun key of the same file. The reason we have the rasters key as separate is because it is reused multiple times in the yaml document. My Rust code would ideally look like this:

use confique::Config;                                                                               
use serde::Deserialize;                                                                             
use std::path::PathBuf;                                                                             

#[derive(Debug, Config, Deserialize)]                                                               
pub struct RasterConfig {     ■ similarly named struct `RasterConfig` defined here                  
    path: PathBuf,                                                                                  
}                                                                                                   

#[derive(Config, Deserialize)]                                                                      
pub enum RestersConfig {     ■ `confique::Config` can only be derive for structs with named fields  
    Single(RasterConfig),                                                                           
    Multiple(Vec<RasterConfig>),                                                                    
}                                                                                                   

#[derive(Debug, Config, Deserialize)]     ■ you might be missing a type parameter: `<RastersConfig>`
pub struct GeomConfigOpts {     ■ you might be missing a type parameter: `<RastersConfig>`          
    zmax: Option<f64>,                                                                              
    zmin: Option<f64>,                                                                              
    sieve: bool,                                                                                    
    rasters: Option<RastersConfig>,     ■■ a struct with a similar name exists: `RasterConfig`      
}                                                                                                   

#[derive(Debug, Config, Deserialize)]                                                               
pub struct GeomConfig {                                                                             
    geom: GeomConfigOpts,                                                                           
}                                                                                                   

I have included the error indications as well for reference.

I think the confique style has a lot of potential, but I still haven't found one crate that is full featured like pydantic is. In my case, I'd rather priotitize the ability to have enums and nested structs, than to be able to get them from ENV, which can be worked around in many ways.

Any thoughts on this? Thanks!

LukasKalbertodt commented 4 months ago

@jreniel Is there any reason you have to derive Config for RastersConfig and RasterConfig? Deriving Deserialize Should suffice. At least you can make it work, i.e. read a file that way. As I've written above, I only see two reasons why this "nested objects/lists/maps" even have to be supported by confique natively: merging and config templates. Do you specifically need one of those in your case? For the fields in question?

jreniel commented 4 months ago

Thanks for looking into this.

Is there any reason you have to derive Config for RastersConfig and RasterConfig?

Because the compiler will complain that GeomConfigOpts has some attributes that doesn't implement Config. It's literally confique who's implicitly requiring it. However, Config doesn't work for enums, anyways, and that's a as far as I got through this route.

At least you can make it work, i.e. read a file that way.

But it won't validate and populate the nested struct, so reading the file this way is pointless. And it is pointless for two reasons. The RastersConfig is a reusable struct. There are other components that construct other data object, but basically all of the data object are obtained from the rasters. These rasters can have semi-complex user configurations, so we need to validate those always. We can't simply ingest plain text, and we need this to be expanded in other parts of the code as well.

In the end, I was able to get what I needed by just using plain serde and the validator crate. Below is a snippet.

I really like where confique is going but if it can't do nested then I might as well just go for serde+validator which does handle nesting and flattening. I think not having proper nesting and enum support is a deal breaker. I can do away with the enums, but not with the nesting, specially since plain serde handles it nicely, so using confique in this case becomes a step back, rather than forwards. Don't get me wrong, I still think it's great to have this kind of libraries, but at least in this case, functionality overcomes convenience.

Thanks for looking into this anyway!

#[derive(Debug, Validate, Serialize, Deserialize)]          
pub struct GeomRasterConfig {                               
    #[validate(nested)]                                     
    #[serde(flatten)]                                       
    raster: RasterConfig,                                   
    zmax: Option<f64>,                                      
    zmin: Option<f64>,                                      
}                                                           

#[derive(Debug, Validate, Serialize, Deserialize)]          
pub struct GeomConfigOpts {                                 
    zmax: Option<f64>,                                      
    zmin: Option<f64>,                                      
    #[serde(default)]                                       
    sieve: bool,                                            
    #[validate(nested)]                                     
    rasters: Vec<GeomRasterConfig>,                         
}                                                           

#[derive(Debug, Validate, Serialize, Deserialize)]          
pub struct GeomConfig {                                     
    #[validate(nested)]                                     
    geom: GeomConfigOpts,                                   
}                                                           
impl GeomConfig {                                           
    pub fn iter_raster_windows(&self) -> RastersWindowIter {
        RastersWindowIter {                                 
            geom: &self.geom,                               
            current_raster_index: 0,                        
            current_window_index: 0,                        
        }                                                   
    }                                                       
}                                                           
Nemo157 commented 3 months ago

I think this is the relevant issue, I needed support for having a map of a string id to a nested-config object in my config, and managed to add it with a newtype with a manual impl of Config, I wonder if something like this impl could be added for HashMap directly:

#[derive(Debug)]
pub struct ConfigMap<K: DeserializeOwned + Eq + Hash, V: confique::Config> {
    pub inner: HashMap<K, V>,
}

#[derive(Debug)]
pub struct ConfigMapPartial<K: DeserializeOwned + Eq + Hash, V: confique::Partial> {
    pub inner: HashMap<K, V>,
}

impl<K: DeserializeOwned + Eq + Hash, V: confique::Config> confique::Config for ConfigMap<K, V> {
    type Partial = ConfigMapPartial<K, V::Partial>;

    // TODO
    const META: confique::meta::Meta = confique::meta::Meta {
        name: "",
        doc: &[],
        fields: &[],
    };

    fn from_partial(partial: Self::Partial) -> Result<Self, confique::Error> {
        // TODO: this needs to use `confique::internal::map_err_prefix_path` to give the correct path in errors
        let inner: Result<_, confique::Error> = partial.inner.into_iter().map(|(k, v)| Ok((k, V::from_partial(v)?))).collect();
        Ok(Self { inner: inner? })
    }
}

impl<'de, K: DeserializeOwned + Eq + Hash, V: confique::Partial> serde::Deserialize<'de> for ConfigMapPartial<K, V> {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: serde::de::Deserializer<'de> {
       Ok(Self { inner: HashMap::deserialize(deserializer)? })
    }
}

impl<K: DeserializeOwned + Eq + Hash, V: confique::Partial> confique::Partial for ConfigMapPartial<K, V> {
    fn empty() -> Self { Self { inner: HashMap::new() } }
    fn default_values() -> Self { Self::empty() }
    fn from_env() -> Result<Self, confique::Error> {
        // TODO: dunno if this makes sense to support somehow
        Ok(Self::empty())
    }
    fn with_fallback(mut self, fallback: Self) -> Self {
        for (k, v) in fallback.inner {
            let v = match self.inner.remove(&k) {
                Some(value) => value.with_fallback(v),
                None => v,
            };
            self.inner.insert(k, v);
        }
        self
    }
    fn is_empty(&self) -> bool { self.inner.is_empty() }
    fn is_complete(&self) -> bool { self.inner.values().all(|v| v.is_complete()) }
}

impl<K: DeserializeOwned + Eq + Hash + Clone, V: confique::Partial + Clone> Clone for ConfigMapPartial<K, V> {
    fn clone(&self) -> Self {
        Self { inner: self.inner.clone() }
    }
}