Kixunil / configure_me

A Rust library for processing application configuration easily
62 stars 14 forks source link

Re-evaluate proc macro approach #3

Open Kixunil opened 5 years ago

Kixunil commented 5 years ago

Pros:

Cons:

Kixunil commented 5 years ago

Any standardization that sandboxes proc macros might kill this.

Kixunil commented 4 years ago

I think that after cutting man page generation out from build script, this may be better, but we have to implement some kind of scanning. https://github.com/alexcrichton/proc-macro2/issues/213 may be a blocker for this.

jaskij commented 2 years ago

@Kixunil I've got a proposal written up for this, do you want it here or in a separate issue? The TL;DR is that #[path = "src/config.rs"] mod config is legal in build.rs.

Kixunil commented 2 years ago

Sounds interesting, feel free to write it here! I also had an idea recently to do something similar to prost - they have both build script and a proc macro. My thinking was to generate the struct using build script and proceed from there using proc macro. So people who don't like build script can still use proc macro but they'll lose man page generation.

jaskij commented 2 years ago

TL;DR

#[path = "src/config.rs"] mod config is legal in build.rs.

What to do:

Add a #[derive] which allows for parsing, and can be used to generate what is needed in build.rs. And then import the mod with the config in build.rs.

Example

Imagine a code structure like this:

 .
├──  build.rs
├──  Cargo.toml
└──  src
   ├──  app.rs
   ├──  bin
   ├──  config.rs
   ├──  main.rs
   └──  receiver.rs

We have a config struct which (preferably) sits alone in config.rs.

#[derive(::configure_me::Config)]
struct Config {
    #[cfg_optional(true), cfg_doc("Port to listen on."]
    port: u16,
    #[cfg_default(::std::Ipv4Addr::new(0, 0, 0, 0)), cfg_doc("IP address to bind to."]
    bind_addr: ::std::net::Ipv4Addr,
    #[cfg_doc("IP address to bind to."]
    tls_cert: String,
}

With build.rs like this:

use configure_me_codegen::ConfigExt;

#[path = "src/config.rs"] mod my_app_config;

fn main() {
    my_app_config::generate_configure_toml();
}

The #[derive] would take care of the parser, while build.rs would generate config_spec.toml (for consumption by external tools like cfg_me) like in the README.

[[param]]
name = "port"
type = "u16"
optional = false
doc = "Port to listen on."

[[param]]
name = "bind_addr"
type = "::std::net::Ipv4Addr" # Yes, this works and you can use your own types as well! (impl. Deserialize and ParseArg)
default = "::std::net::Ipv4Addr::new(0, 0, 0, 0)" # Rust expression that creates the value.
doc = "IP address to bind to."

[[param]]
name = "tls_cert"
type = "String"
doc = "Path to the TLS certificate. The connections will be unsecure if it isn't provided."

The Downside

And it's a big one: this requires somehow importing the config struct in build.rs. Most likely through specifying a module with a path. This feels quite hacky. But it does work.

There is an alternative approach with moving the configs to a separate (sub)crate, but that probably has even more downsides (publishing?) than the mod approach.

Discussion

What I'm relying on heavily is the compiler - to be precise, the fact that it will drop unused functions from resulting binaries. So, if #[derive] does not introduce any new data members to the struct, there should be no impact on the code.

If further separation is desired, an extension trait to be imported only in build.rs is an option.

As a bonus including additional outputs should be fairly trivial at that point.

Closing thoughts

This is just an idea, here for discussion. To state the obvious, it is not intended to replace the the current config_spec.toml, but rather complement it, allowing defining it in code. Unless there's a good way to actually get the config struct in build.rs, this should be an optional extras.

All the details like API, generated names, etc, are temporary.

jaskij commented 2 years ago

Sounds interesting, feel free to write it here! I also had an idea recently to do something similar to prost - they have both build script and a proc macro. My thinking was to generate the struct using build script and proceed from there using proc macro. So people who don't like build script can still use proc macro but they'll lose man page generation.

My thinking is completely the other way around - make the struct Rust, allow making everything that's possible with just proc_macros to be done with [#derive], and use the module in build.rs to generate the TOML config and other stuff as needed.

Re: Prost, it's one of the tools which call an external process to generate stuff. Taking a page from their book adding a nice wrapper around calling cfg_me for build.rs would be nice. The use case is to have a single command build without requiring anything but cargo and a Rust toolchain.

jaskij commented 2 years ago

(Sorry for multiple comments, I don't feel comfortable editing in stuff in issue comments).

While Prost is a good example, I wouldn't go too far in how they do things - the use case is entirely different.

My main pain points with configure_me are 1) that the current TOML is a bit unwieldy, 2) bad IDE support for generated code. In Prost's (and Protobuf's) case 1) is not an issue because Protobuf is much more concise and 2) is necessary. We have a project where proto files are used in three different programs, written in three different languages. configure_me doesn't really need that kind of versatility.

Kixunil commented 2 years ago

Hmm, the idea of generating Toml from .rs sounds interesting! I see one problem with it that's probably fixable: the module may contain types that are not available in build.rs (IOW some build dependencies might need to be duplicated)

The fix would be to use a proc macro that prevents rustc seeing the struct - NOT derive macro.

I will think about this more and write my thoughts.

jaskij commented 2 years ago

Hmm, the idea of generating Toml from .rs sounds interesting! I see one problem with it that's probably fixable: the module may contain types that are not available in build.rs (IOW some build dependencies might need to be duplicated)

True, although that could be worked around by separating the config struct in a separate module. Personally I find the idea of 1k+ line files abhorrent (I try to stick to under a hundred, five hundred max), but I've seen a few crates where the whole thing is stuffed into lib.rs.

One thing I probably have forgotten to mention earlier, as an intermediary step in the implementation, toml could be generated and the rest of the process happen like it did before.

dr-orlovsky commented 2 years ago

@Kixunil you may be interested in checking amplify_syn crate, which provides a lot of simplification over designing and controlling proc macro. For instance, see how deriving Getters is done with it in amplify_derive.

jaskij commented 2 years ago

@Kixunil for now, to avoid the hassle of generated code, I'm testing an option of doing #[derive(Deserialize, Merge, StructOpt)], Merge coming from the merge crate.

My use case does not need debconf or a manpage, so this feels like it should be enough if everything works as intended.

So far this seems to require a lot of boilerplate, due in part to structopt's maintainer's refusal to support the Default trait (see structopt#150 for discussion. The boilerplate required is practically unchanged from the sample in the OP. I have yet to check if merge::Merge and serde::Deserialize behave properly in this setting, but doubt there would be any issues.

Kixunil commented 2 years ago

Interesting, the merge crate looks strangely familiar. From quick look at the issue about Default I agree with their reasoning if I understood it right. If you look at the generated code of my crate it works by creating a second struct that uses Option for each field and only structs with optional fields are merged and finally converted to non-option field (returning an error if a field is missing).

So it seems you'd need some way of generating the secondary struct with those three derives and then generating some code that does the final processing. There's one more issue though. To work properly all arguments must behave as optional but if some fields are mandatory --help should say it. So we're hitting a problem where structopt would have to know about the original struct too. So I don't think it can be fully composable. Maybe with collaboration.