conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.27k stars 981 forks source link

[bug] Semicolon not escaped when CMakeDeps generates target properties #14951

Open dalaen opened 1 year ago

dalaen commented 1 year ago

Environment details

Steps to reproduce

conanfile.py (standard with "mariadb-connector-c/3.3.3" as a dependency)

from conan import ConanFile
from conan.tools.cmake import CMakeToolchain, CMakeDeps, CMake, cmake_layout

class TestMariadbConan(ConanFile):
    name = "test_mariadb"
    version = "0.1.0"

    # Optional metadata
    license = "<Put the package license here>"
    author = "<Put your name here> <And your email here>"
    url = "<Package recipe repository url here, for issues about the package>"
    description = "<Description of TestMariadb here>"
    topics = ("<Put some tag here>", "<here>", "<and here>")

    # Binary configuration
    settings = "os", "compiler", "build_type", "arch"

    # Sources are located in the same place as this recipe, copy them to the recipe
    exports_sources = "CMakeLists.txt"

    requires = ["mariadb-connector-c/3.3.3"]

    def layout(self):
        cmake_layout(self)

    def generate(self):
        tc = CMakeToolchain(self)
        tc.generate()
        deps = CMakeDeps(self)
        deps.generate()

    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()

    def package(self):
        cmake = CMake(self)
        cmake.install()

    def package_info(self):
        self.cpp_info.libs = ["test_mariadb"]

CMakeLists.txt:

find_package(mariadb-connector-c REQUIRED)

get_target_property(output mariadb-connector-c::mariadb-connector-c INTERFACE_INCLUDE_DIRECTORIES)
message(STATUS "${output}")

Run:

Logs

Running conan build . displays the following output (trimming the irrelevant warnings):

-- Using Conan toolchain: /home/poxa151/tmp/test_mariadb/build/Release/generators/conan_toolchain.cmake
-- Conan: Target declared 'mariadb-connector-c::mariadb-connector-c'
-- Conan: Target declared 'ZLIB::ZLIB'
-- Conan: Component target declared 'CURL::libcurl'
-- Conan: Component target declared 'OpenSSL::Crypto'
-- Conan: Component target declared 'OpenSSL::SSL'
-- Conan: Target declared 'openssl::openssl'
-- Conan: Including build module from '/home/poxa151/.conan/data/openssl/3.1.3/_/_/package/c606a0dd8a0c8668b09aa21600e0d8423899894a/lib/cmake/conan-official-openssl-variables.cmake'
-- $<$<CONFIG:Release>:/home/poxa151/.conan/data/mariadb-connector-c/3.3.3/_/_/package/66c5704ea4604600bea83076580ec10d77f539f9/include;/home/poxa151/.conan/data/mariadb-connector-c/3.3.3/_/_/package/66c5704ea4604600bea83076580ec10d77f539f9/include/mariadb>

The issue is the last line, showing the content of the generated INTERFACE_INCLUDE_DIRECTORIES of the mariadb-connector-c::mariadb-connector-c, generated by CMakeDeps. The semicolon in the middle can cause trouble when one loops over the variable. In a generator expression like this, it should be escaped/replaced with $<SEMICOLON> (doc: https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#escaped-characters).

Instead of generating:

$<$<CONFIG:Release>:/home/poxa151/.conan/data/mariadb-connector-c/3.3.3/_/_/package/66c5704ea4604600bea83076580ec10d77f539f9/include;/home/poxa151/.conan/data/mariadb-connector-c/3.3.3/_/_/package/66c5704ea4604600bea83076580ec10d77f539f9/include/mariadb>

generate

$<$<CONFIG:Release>:/home/poxa151/.conan/data/mariadb-connector-c/3.3.3/_/_/package/66c5704ea4604600bea83076580ec10d77f539f9/include$<SEMICOLON>/home/poxa151/.conan/data/mariadb-connector-c/3.3.3/_/_/package/66c5704ea4604600bea83076580ec10d77f539f9/include/mariadb>
memsharded commented 1 year ago

Hi @dalaen

Thanks for your report. The first thing I have tried is to assess if it had impact on the include dirs resolution:

def test_cmakedeps_semicolon_escape():
    c = TestClient()
    pkg = textwrap.dedent("""\
        import os
        from conan import ConanFile
        from conan.tools.files import save
        class Pkg(ConanFile):
            def package(self):
                save(self, os.path.join(self.package_folder, "sub1/f1.h"), "")
                save(self, os.path.join(self.package_folder, "sub2/f2.h"), "")
            def package_info(self):
                self.cpp_info.includedirs = ["sub1", "sub2"]
        """)

    app = textwrap.dedent("""
        from conan import ConanFile
        from conan.tools.cmake import CMake
        class Pkg(ConanFile):
            settings = "os", "arch", "compiler", "build_type"
            requires = "pkg/0.1"
            generators = "CMakeDeps", "CMakeToolchain"
            def build(self):
                cmake = CMake(self)
                cmake.configure()
                cmake.build()
        """)
    c.save({"pkg/conanfile.py": pkg,
            "app/conanfile.py": app,
            "app/CMakeLists.txt": gen_cmakelists(appsources=["main.cpp"], find_package=["pkg"]),
            "app/main.cpp": gen_function_cpp(name="main", includes=["f1", "f2"])
            })
    c.run("create pkg --name=pkg --version=0.1")
    c.run("build app")
    assert "conanfile.py: Running CMake.build()" in c.out

Apparently it is working fine, and it is not an issue for CMake to interpret and use the string correctly.

The issue would come from this expression in our generated .cmake file:

set_property(TARGET {{root_target_name}}
                         PROPERTY INTERFACE_INCLUDE_DIRECTORIES
                         $<$<CONFIG:{{configuration}}>:${{'{'}}{{pkg_name}}_INCLUDE_DIRS{{config_suffix}}}> APPEND)

I'd expect that the escape characters are to be used when the definition of the generator expression happens, and there is no semicolon in that expression at all. What I would not expect is to have to escape the contents of the variable at all. That would mean that for every generator expression that is used out there using variables contents, those contents should be replace all possible escape characters?

As a side question: why would you like to get those directories anyway? Why just target_link_libraries() the target doesn't work?

dalaen commented 1 year ago

Hi @memsharded,

Thanks for your reply and investigation!

I'd expect that the escape characters are to be used when the definition of the generator expression happens, and there is no semicolon in that expression at all. What I would not expect is to have to escape the contents of the variable at all. That would mean that for every generator expression that is used out there using variables contents, those contents should be replace all possible escape characters?

I would concur to that if I understand you 100%. The issue is here, I'm not generating the expression that's put into the generator-expression at all.

The input is:

self.cpp_info.includedirs = ["sub1", "sub2"]

and the output is:

$<$<CONFIG:Release>:PKG_ROOT/sub1;PKG_ROOT/sub2>

I have no chance, as a user, to escape the generated semicolon. Or did I get you wrong?

To your side question: we have some custom CMake functions which are collecting a CMake target's dependencies and include folders to provide them as command line arguments to an external program for generating code. The command line argument string is built by looping over the include folders, and because the foreach() loop in CMake takes a semicolon or a whitespace as a splitting token, we get to this issue here when it's combined with generator-expressions.

It will basically give:

memsharded commented 1 year ago

To your side question: we have some custom CMake functions which are collecting a CMake target's dependencies and include folders to provide them as command line arguments to an external program for generating code. The command line argument string is built by looping over the include folders, and because the foreach() loop in CMake takes a semicolon or a whitespace as a splitting token, we get to this issue here when it's combined with generator-expressions.

It seems that you are trying to interpret a generator expression at configure time, isn't it? For parsing the INTERFACE_INCLUDE_DIRECTORIES variable at configure time, having a escape character or not, would be similar. Consider that you could also get some value for it like:

$<$<CONFIG:Release>:PKG_ROOT/sub1;PKG_ROOT/sub2>$<$<CONFIG:Debug>:PKG_ROOT/sub1;PKG_ROOT/sub2>

How would the $<SEMICOLON> replacement help there to get the list of folders? It seems you cannot simply iterate the value of this variable to get folders. Maybe while parsing of the variable, you might be dropping or filtering out the $<$<CONFIG:Release> part?

I have tried it in the test:

def test_cmakedeps_semicolon_escape():
    c = TestClient()
    pkg = textwrap.dedent("""\
        import os
        from conan import ConanFile
        from conan.tools.files import save
        class Pkg(ConanFile):
            def package(self):
                save(self, os.path.join(self.package_folder, "sub1/f1.h"), "")
                save(self, os.path.join(self.package_folder, "sub2/f2.h"), "")
            def package_info(self):
                self.cpp_info.includedirs = ["sub1", "sub2"]
        """)

    app = textwrap.dedent("""
        from conan import ConanFile
        from conan.tools.cmake import CMake
        class Pkg(ConanFile):
            settings = "os", "arch", "compiler", "build_type"
            requires = "pkg/0.1"
            generators = "CMakeDeps", "CMakeToolchain"
            def build(self):
                cmake = CMake(self)
                cmake.configure()
                cmake.build()
        """)
    cmake = textwrap.dedent("""
        set(CMAKE_CXX_COMPILER_WORKS 1)
        set(CMAKE_CXX_ABI_COMPILED 1)
        set(CMAKE_C_COMPILER_WORKS 1)
        set(CMAKE_C_ABI_COMPILED 1)

        cmake_minimum_required(VERSION 3.15)
        project(project CXX)

        find_package(pkg)

        get_target_property(output pkg::pkg INTERFACE_INCLUDE_DIRECTORIES)
        message(STATUS "${output}")
        foreach(_include ${output})
            message(STATUS "INCLUDE=${_include}")
        endforeach()

        add_executable(myapp  main.cpp )
        target_include_directories(myapp PUBLIC "include")

        target_link_libraries(myapp  pkg::pkg )

        """)
    c.save({"pkg/conanfile.py": pkg,
            "app/conanfile.py": app,
            "app/CMakeLists.txt": cmake,
            "app/main.cpp": gen_function_cpp(name="main", includes=["f1", "f2"])
            })
    c.run("create pkg --name=pkg --version=0.1")
    c.run("build app")
    print(c.out)
    assert "conanfile.py: Running CMake.build()" in c.out

And the output (with a change in the CMakeDeps to change it to SEMICOLON is still:

-- INCLUDE=$<$<CONFIG:Release>:T:/tmp19ifl7zsconans/path with spaces/.conan2/p/b/pkg4e0fcf4088a89/p/sub1$<SEMICOLON>T:/tmp19ifl7zsconans/path with spaces/.conan2/p/b/pkg4e0fcf4088a89/p/sub2>

So this change seems it will not help you to get the iterate the different folders anyway?

dalaen commented 1 year ago

The generator-expression is interpreted at build time when the target having the add_custom_command is built. In your example above, you are trying to interpret it at generation time, where the generation-expression is still not interpreted.

I made a sample CMakeLists.txt to illustrate all of this, and it actually brings a lot of light:

cmake_minimum_required(VERSION 3.27)
project(test LANGUAGES NONE)

message(STATUS "### Generating unescaped ###")
set(var_unescaped "$<$<BOOL:True>:/path/to/lib1;/path/to/lib2>;/path/to/lib3;$<$<BOOL:True>:/path/to/lib4>")

set(codegen_args_unescaped "")
foreach(var ${var_unescaped})
   message(STATUS "Appending ${var}")
   set(codegen_args_unescaped ${codegen_args_unescaped} "-I${var}")
endforeach()

message(STATUS "codegen_args_unescaped=${codegen_args_unescaped}")
file(GENERATE OUTPUT unescaped CONTENT "${codegen_args_unescaped}")

message(STATUS "##########################")
message(STATUS "### Generating escaped ###")
set(var_escaped "$<$<BOOL:True>:path/to/lib1$<SEMICOLON>path/to/lib2>;/path/to/lib3;$<$<BOOL:True>:/path/to/lib4>")

set(codegen_args_escaped "")
foreach(var ${var_escaped})
   message(STATUS "Appending ${var}")
   set(codegen_args_escaped ${codegen_args_escaped} "-I${var}")
endforeach()

message(STATUS "codegen_args_escaped=${codegen_args_escaped}")
file(GENERATE OUTPUT escaped CONTENT "${codegen_args_escaped}")

Running cmake .:

-- ### Generating unescaped ###
-- Appending $<$<BOOL:True>:/path/to/lib1
-- Appending /path/to/lib2>
-- Appending /path/to/lib3
-- Appending $<$<BOOL:True>:/path/to/lib4>
-- codegen_args_unescaped=-I$<$<BOOL:True>:/path/to/lib1;-I/path/to/lib2>;-I/path/to/lib3;-I$<$<BOOL:True>:/path/to/lib4>
-- ##########################
-- ### Generating escaped ###
-- Appending $<$<BOOL:True>:path/to/lib1$<SEMICOLON>path/to/lib2>
-- Appending /path/to/lib3
-- Appending $<$<BOOL:True>:/path/to/lib4>
-- codegen_args_escaped=-I$<$<BOOL:True>:path/to/lib1$<SEMICOLON>path/to/lib2>;-I/path/to/lib3;-I$<$<BOOL:True>:/path/to/lib4>
-- Configuring done (0.0s)
-- Generating done (0.0s)

Content of the generated files: unescaped -I/path/to/lib1;-I/path/to/lib2;-I/path/to/lib3;-I/path/to/lib4 escaped -Ipath/to/lib1;path/to/lib2;-I/path/to/lib3;-I/path/to/lib4

Actually, for my use case, I want the behavior of... unescaped! However, the "unescaped" file in this example is as such because it has been dissembled with the foreach and reassembled identically (basically $<$<BOOL:True>:/path/to/lib1;/path/to/lib2> became -I$<$<BOOL:True>:/path/to/lib1;-I/path/to/lib2>, and that's a special case.

In our custom CMake scripts, we are filtering input in the foreach which then tempers the disassembly/reassembly…

On one side, doing a foreach when the semicolon is not escaped yields weird results:

-- Appending $<$<BOOL:True>:/path/to/lib1
-- Appending /path/to/lib2>
-- Appending /path/to/lib3
-- Appending $<$<BOOL:True>:/path/to/lib4>

On the other side, escaping the semicolon binds together lib1 and lib2 (in this example) in a foreach, which might not be the expected behavior as well.

-- Appending $<$<BOOL:True>:path/to/lib1$<SEMICOLON>path/to/lib2>
-- Appending /path/to/lib3
-- Appending $<$<BOOL:True>:/path/to/lib4>

The optimal behavior would be, then, I guess, a genex for each lib: $<$<BOOL:True>:/path/to/lib1>;$<$<BOOL:True>:/path/to/lib2> giving the following output:

-- Appending $<$<BOOL:True>:path/to/lib1>
-- Appending $<$<BOOL:True>:/path/to/lib2>
-- Appending /path/to/lib3
-- Appending $<$<BOOL:True>:/path/to/lib4>
memsharded commented 1 year ago

This is very interesting indeed!

Generating one genex for each library might be a bit challenging with the current implementation of CMakeDeps, as the generation of targets is decoupled from the generation of libraries lists. We could parse, split and reassemble things, but sounds it adds quite a bit of extra complexity.

I am trying to think a bit outside of the box, collecting the include-dirs from dependencies in Conan is very straightforward, what about something in this line:

def generate(self):
      # This can be made a 2 lines, made it this way to be more clear
      mycmake = []
      for dep in self.dependencies.host.values():
          includedirs = dep.cpp_info.includedirs
          for inc in includedirs:
                mycmake.append(...inc in the right format)
     save(self, "mycmake.cmake", "\n".join(mycmake))

And directly generate the file that you want, with the exact format you want, and then you can leverage that and use it in your add_custom_command.?

dalaen commented 1 year ago

Generating one genex for each library might be a bit challenging with the current implementation of CMakeDeps, as the generation of targets is decoupled from the generation of libraries lists. We could parse, split and reassemble things, but sounds it adds quite a bit of extra complexity.

Indeed, though it feels like the correct answer. Well, the genex is introduced in the generation of targets in CMakeDeps, maybe the library list can be sanitized there (actually I would not expect this to happen in earlier steps, where it might not end up in a genex?).

Collecting the list of includedirs through the conanfile.py would not really be suited for my use case. Basically our current CMake function takes a CMake target as an input, collects its dependencies, assembles the include directories of the dependencies together, does some filtering and pass it to a code generator.

That'd be unfortunately out of scope of the conanfile.py where CMake targets are still not born, and quite heavy work for a CMake function located in a dependency (that we fortunately control, but not in the consumer/downstream project, rather in a lib/upstream project).

I am also not blocked at the moment, so not in search of a short-term solution, as I can probably get rid of the filtering, hence falling back into the special case of disassembling and reassembling 1-to-1. :)

memsharded commented 1 year ago

That'd be unfortunately out of scope of the conanfile.py where CMake targets are still not born, and quite heavy work for a CMake function located in a dependency (that we fortunately control, but not in the consumer/downstream project, rather in a lib/upstream project).

That generate() only happens in the consumer, not necessary to modify the upstream, and it also contains the information about the targets that will be created in CMake, so it is easy to create a file that defines a list of includes per target name. Using that in your own CMakeLists would be just getting the value of a variable from a file that you can include().

But it is very likely that I don't have all the information and constraints, so not an issue, I was just trying to think if there could be other more convenient ways to approach this problem. I have had (and I have seen many users) having quite good success in moving complicated functionality belonging to the "generate" stage (gathering and processing information from dependencies), from CMakeLists.txt or other obscure build systems to the conanfile.py, so I thought this could have been a possible approach.

I am also not blocked at the moment, so not in search of a short-term solution, as I can probably get rid of the filtering, hence falling back into the special case of disassembling and reassembling 1-to-1. :)

Ok, sounds good, happy to know that you are not blocked :)