corrosion-rs / corrosion

Marrying Rust and CMake - Easy Rust and C/C++ Integration!
https://corrosion-rs.github.io/corrosion/
MIT License
1.07k stars 103 forks source link

Integration of code generators #204

Open jschwe opened 2 years ago

jschwe commented 2 years ago

I implemented that approach of explicitly telling the code generator where to output the headers by setting an environment variable and it works well. I have written a proposal for upstreaming this approach into cxx-builder. That would allow Corrosion to automatically set that environment variable so using cxx with CMake wouldn't require anything different than using Corrosion with any other Rust library. @jschwe I would appreciate your input on that proposal.

Originally posted by @Be-ing in https://github.com/corrosion-rs/corrosion/issues/150#issuecomment-1229265010

jschwe commented 2 years ago

@Be-ing I opened a new issue to discuss your request.

As far as I can see you wish for the following:

  1. Corrosion should set an environment variable with a path to a location where code generators may generate code into, probably something like ${CMAKE_BINARY_DIR}/generated.
  2. Corrosion should add the path to include directories.

I think in general, this is not only interesting for your specific project, but also for e.g. a potential cbindgen integration, or a bindgen extension that generates additional .h/.c files to make C static inline functions available to rust.

However, I think providing support for 2. is not so easy, because:

Be-ing commented 2 years ago

If you don't want to unconditionally add the include paths, an option could be added to corrosion_import_crates to do so. For example:

corrosion_import_crates(MANIFEST_PATH rust/src/Cargo.toml CRATES rust-crate-that-uses-cxx CXX)

In case someone wants to import multiple crates, some of which use the code generator and some do not, they could make multiple calls to corrosion_import_crates with and without the option for using the code generator.

I don't think this is feasible in a general case; how to do this depends on the implementation details of the specific code generator. I am hoping we can reach a consensus how to handle this for cxx in https://github.com/dtolnay/cxx/issues/462. If that is not accepted, Corrosion could still add a CXX_QT option for corrosion_import_crates, but I would prefer to have one CXX option that works for both cxx-build and cxx-qt-build.

Be-ing commented 2 years ago

To clarify, for cxx and cxx-qt, I am only concerned with adding include paths for the generated headers. The code generator that runs in crates' build.rs not only generates C++, it also compiles it with the cc crate. This is important because otherwise the code generation would have to run at CMake configure time if CMake was building the generated C++ code.

jschwe commented 2 years ago

I am only concerned with adding include paths for the generated headers

I still believe this is hard to do without the user specifying which targets should get the additional include paths. include_directories() only affects the current directory and downwards, so depending on your directory structure, so this would only work if the directories are structured accordingly. I prefer the user/downstream specifying which targets should have the additional include directories.

The code generator that runs in crates' build.rs not only generates C++, it also compiles it with the cc crate. This is important because otherwise the code generation would have to run at CMake configure time if CMake was building the generated C++ code.

At work we also have a custom code generator, which is run at build time. Is there something special about your project that requires running it at configure time? Have you looked at add_custom_command()?

I don't think this is feasible in a general case; how to do this depends on the implementation details of the specific code generator.

Cmake >= 3.18 supports indirect function calls, so I think we may be able to provide some interface where downstream can customize the generator behavior via their own function. I have not thought this through at all, but maybe corrosion could provide something along the lines of:

corrosion_generate_headers(TARGETS <targets which should receive updated include directories>  
   INCLUDE_VISIBILITY <PUBLIC|PRIVATE|INTERFACE>
   OUTPUT_DIRECTORY <directory where the generator should generates the files too>
   GENERATOR_FN <function to call downstream defined function, which defines one or multiple custom 
                              commands to create some output header file. A list of output files is returned back to 
                             `corrosion_generate_headers` via a parameter.>
   GENERATOR_CUSTOM_ARGUMENTS <additional arguments which should be passed to GENERATOR_FN.>

   DEPENDS <>
Be-ing commented 2 years ago

I still believe this is hard to do without the user specifying which targets should get the additional include paths.

I don't understand what the issue is with the user specifying which targets use the code generator.

Be-ing commented 2 years ago

Cmake >= 3.18 supports indirect function calls, so I think we may be able to provide some interface where downstream can customize the generator behavior via their own function.

I think that would be overcomplicated. Users may as well add target_include_directories after corrosion_import_crates as they can already do.

Be-ing commented 2 years ago

At work we also have a custom code generator, which is run at build time. Is there something special about your project that requires running it at configure time? Have you looked at add_custom_command()?

cxx-qt-build outputs multiple C++ files and their names are dependent on the contents of the Rust modules they are generated from, so CMake can't know at configure time what source files to add to the target without running the code generator. But that problem is already solved by compiling the generated C++ with the cc crate in build.rs.

jschwe commented 2 years ago

I don't understand what the issue is with the user specifying which targets use the code generator.

Users may as well add target_include_directories after corrosion_import_crates as they can already do.

Ah, then this was a misunderstanding on my side. I was under the impression that you didn't want the user to have to specify which targets get the additional include paths. So then for your use-case it would be sufficient to set an environment variable pointing to a location where the code-generator should generate the code to?

Be-ing commented 2 years ago

My goal is for Corrosion to handle both setting the environment variable and adding the include path so the user doesn't even need to know about these. Adding a CXX option to corrosion_import_crates to selectively enable this feature would be okay.

Be-ing commented 2 years ago

I think in general, this is not only interesting for your specific project, but also for e.g. a potential cbindgen integration, or a bindgen extension that generates https://github.com/rust-lang/rust-bindgen/issues/1090.

Yes, if we could get cxx and cbindgen upstreams to agree to using the same environment variable, Corrosion would only need one option to handle them all.

jschwe commented 2 years ago

My goal is for Corrosion to handle both setting the environment variable and adding the include path so the user doesn't even need to know about these

Sorry, but I still don't see a good way for corrosion to correctly add include paths without the user providing additional information.

I don't understand what the issue is with the user specifying which targets use the code generator.

To clarify, the include directories should be set via target_include_directories() which requires knowledge of the target which should receive the additional include directories. Using include_directories would only work for some project structures, where target is in the same or a subdirectory, but not if the target requiring the include directory is in a neighbor or parent directory.

I'm open to proving an environment variable which could specify a directory such as ${CMAKE_BINARY_DIR}/corrosion-generated for downstream users to place their generated output into. I don't think cxx or cbindgen should have to care about such a variable. The user of the code generator should just specify the output directory when invoking the generator.

Be-ing commented 2 years ago

I never proposed using include_directories, nor do I ever recommend using those legacy non-target-specific CMake functions.

I don't think cxx or cbindgen should have to care about such a variable.

I don't understand how cxx or cbindgen could know where to put their output without being aware of this environment variable.

jschwe commented 2 years ago

I never proposed using include_directories, nor do I ever recommend using those legacy non-target-specific CMake functions.

Then I just really don't understand how you expect corrosion to know which C/C++ CMake targets should get the additional include directories. Could you give an example of how the syntax of corrosion_import_crate would look like in your vision, and how corrosion would then know which C/C++ targets need the additional include directories?

I don't understand how cxx or cbindgen could know where to put their output without being aware of this environment variable.

Presumably the code generator would anyway need to be called by the user, because options need to be specified. E.g. cbindgen takes a --output <out_header.h> option, so the user could just specify that.

Be-ing commented 2 years ago

corrosion_import_crates(MANIFEST_PATH rust/src/Cargo.toml CRATES rust-crate-that-uses-cxx CXX)

could set the environment variable and the include path for the rust-crate-that-uses-cxx target.

Presumably the code generator would anyway need to be called by the user, because options need to be specified. E.g. cbindgen takes a --output option, so the user could just specify that.

The code generators are Rust libraries called by build.rs, not CLI tools. I tried the CLI tool approach, it was not a good solution.

jschwe commented 2 years ago

Ah thanks, I think I understand a bit better now what you want. Basically you want to set INTERFACE_INCLUDE_DIRECTORIES to the location of the generated headers, so that CMake can add the includes to any C/C++ targets linking in a Rust library. Is that correct so far?

The code generators are Rust libraries called by build.rs, not CLI tools.

That shouldn't really matter though. From your linked issue, I it seems your problem is mainly that cxx-bridge does not support specifying the output location.

Currently I'm still not convinced that this is something that needs to be done on the corrosion side. Couldn't you just add a cxx-qt wrapper aroud corrosion_import_crate() that internally does

corrosion_import_crate(...)
set_env_vars(...)
add_include_directories(... INTERFACE ...)
Be-ing commented 2 years ago

Basically you want to set INTERFACE_INCLUDE_DIRECTORIES to the location of the generated headers, so that CMake can add the includes to any C/C++ targets linking in a Rust library. Is that correct so far?

Yes.

it seems your problem is mainly that cxx-bridge does not support specifying the output location.

Strictly speaking for cxx-qt's purposes, it doesn't need to because cxx-qt-build does not use cxx-build. However, it would simplify the implementation and usage of Corrosion if cxx, cxx-qt, and hopefully cbindgen as well converged on using the same environment variable to specify the directory to output generated headers.

Couldn't you just add a cxx-qt wrapper aroud corrosion_import_crate()

That is possible. cxx-qt used to have its own CMake module before using Corrosion (this was required before cxx-qt-build compiled the C++ code it generated). The problem is that requires distributing a CMake module for cxx-qt, which makes it more complicated for downstream users. It would be simpler for users of cxx-qt to only need Corrosion and have the rest handled by Cargo.

jschwe commented 1 year ago

https://github.com/corrosion-rs/corrosion/pull/244 will be a first step for cxx integration.

Be-ing commented 1 year ago

I had a discussion about this at Cargo office hours a few weeks ago. The conclusion we came to was this could be incorporated into the upcoming binary dependency feature of Cargo, where crates can specify bin crates as dependencies. If Cargo exposed a command line interface for running those binary dependencies, for example something like cargo run --package my_cxx_app --bindep cxxbridge-cmd -- cli-arguments-to-bindep, that would solve the thorny "where does the binary come from" question, the version of the code generator could be precisely controlled with Cargo.lock, and there wouldn't be any redundant compilation of Rust dependencies.

jschwe commented 1 year ago

That does sound useful, but it seems there are still a number of issues remaining before bindeps can be stabilized: https://github.com/rust-lang/cargo/labels/Z-bindeps