gittup / tup

Tup is a file-based build system.
http://gittup.org/tup/
GNU General Public License v2.0
1.18k stars 145 forks source link

Building C++20 modules fails because of a directory and global dependencies #469

Open yamirui opened 2 years ago

yamirui commented 2 years ago
$ tup
[ tup ] [0.000s] Scanning filesystem...
[ tup ] [0.001s] Reading in new environment variables...
[ tup ] [0.001s] No Tupfiles to parse.
[ tup ] [0.001s] No files to delete.
[ tup ] [0.001s] Executing Commands...
* 1) example: g++ -Wall -Wextra -Wpedantic -std=c++20 -fmodules-ts -c example.cpp -o example.o                                    
 *** tup messages ***
tup error: Directory '[redacted]/.cache/testtup/example/gcm.cache' was created, but not subsequently removed. Only temporary directories can be created by commands.
 [  ]  50%
 *** tup: 1 job failed.

Since GCC requires this directory for modules to be useful at all, I'm not sure how to get tup to build my project.

This is in fact not the only problem:

$ tup
[ tup ] [0.000s] Scanning filesystem...
[ tup ] [0.170s] Reading in new environment variables...
[ tup ] [0.170s] Parsing Tupfiles...
 1) [0.001s] .
* 0) [0.001s] example                                                                                                     
tup error: Explicitly named file 'iostream' not found in subdir 'example'
tup error: Error parsing Tupfile line 3
  Line was: ': foreach iostream |> g++ $(CXXFLAGS) -xc++-system-header %f |> gcm.cache'
 [  ] 100%
 *** tup: 1 job failed.

These standard library headers need to be compiled as modules too, which are put in gcm.cache subdirectory, but it appears that I am using this build system in a completely opposite way than intended, as the files are expected to be all in the tree of the project, which for this case is pretty unrealistic constraint and doesn't seem to be an issue considering tup doesn't mind using external dependencies, for example, #include <iostream>.


Example

.
├── example
│   ├── example.cpp
│   └── Tupfile
└── Tuprules.tup

example.cpp:

export module example;

import <iostream>;

export auto add(auto a, auto b) -> decltype(a + b) {
    auto c = a + b;
    std::cout << "c = " << c << '\n';
    return c;
}

Tupfile

: foreach iostream |> g++ $(CXXFLAGS) -xc++-system-header %f |> gcm.cache # what would even be the target in this case?
: foreach *.cpp |> g++ $(CXXFLAGS) -c %f -o %o |> %B.o {objs}
: {objs} |> ar crs %o %f |> libexample.a

Tuprules.tup:

CXXFLAGS += -Wall -Wextra -Wpedantic
CXXFLAGS += -std=c++20 -fmodules-ts

For clarity's sake, this is what I expect:

$ g++ --version
g++ (GCC) 12.1.1 20220730
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cd example
$ g++ -Wall -Wextra -Wpedantic -std=c++20 -fmodules-ts -xc++-system-header iostream
$ g++ -Wall -Wextra -Wpedantic -std=c++20 -fmodules-ts -c example.cpp
$ ar crs libexample.a example.o 
$ tree
.
├── example.cpp
├── example.o
├── gcm.cache
│   ├── example.gcm
│   └── usr
│       └── include
│           └── c++
│               └── 12.1.1
│                   └── iostream.gcm
├── libexample.a
└── Tupfile

I could do just fine precompiling system headers using a shell script or GNU Make and then running tup, considering that right now, the required system headers do need to be tracked by the programmer in any case and compiled separately.

There would be a problem with g++ not finding cache in different directories, but it is trivially solved by creating gcm.cache directory at the root of the project and symlinking to it in every directory so that g++ can find it no matter what Tupfile calls it. So I cannot expect tup to handle things like this.

But I do expect tup to support gcm.cache in some way, or maybe there is a way to write a rule that explicitly names the example.gcm in gcm.cache itself? Considering that example.cpp very predictably maps to example.o + gcm.cache/example.gcm, I believe that should be possible in some way.

yamirui commented 2 years ago

Okay, so I found #295 and #182 (comment) which made me aware of ^ that allows to ignore outputs.

I made gcm.cache manually in the project root and symlinked all relevant directories like example/ to it. Then I changed the rule for compiling in example/Tupfile:

: foreach *.cpp |> g++ $(CXXFLAGS) -c %f -o %o |> %B.o ^gcm.cache/%B.gcm {objs}

The result:

$ tup
[ tup ] [0.000s] Scanning filesystem...
[ tup ] [0.176s] Reading in new environment variables...
[ tup ] [0.177s] Parsing Tupfiles...
 0) [0.002s] example
 [ ] 100%
[ tup ] [0.185s] No files to delete.                                                                                      
[ tup ] [0.185s] Generating .gitignore files...
[ tup ] [0.344s] Executing Commands...
 1) [0.239s] example: g++ -Wall -Wextra -Wpedantic -std=c++20 -fmodules-ts -c example.cpp -o example.o                    
 0) [0.028s] example: ar crs libexample.a example.o                                                                       
 [  ] 100%
[ tup ] [0.814s] Updated.                     

So one part is more or less fixed.

But the problem still persists, I tried

: |> g++ $(CXXFLAGS) -xc++-system-header iostream |>

in hopes that tup would be smart enough to figure it out, but that returns a weird error:

* 1) example: g++ -Wall -Wextra -Wpedantic -std=c++20 -fmodules-ts -xc++-system-header iostream                           
/usr/include/c++/12.1.1/iostream: error: failed to write compiled module: No such file or directory
/usr/include/c++/12.1.1/iostream: note: compiled module file is 'gcm.cache/./usr/include/c++/12.1.1/iostream.gcm'
 *** tup messages ***
 *** Command ID=60 failed with return value 1

Curiously, this runs after the compilation step, which makes it fail twice.

I can only assume that it has something to do with how FUSE is used by tup, since I don't specify input files, tup doesn't know to include them, and there's no simple way to specify them either.

That being said, I'm done trying, but if someone knows how to work with this, it'd still be nice to learn about it.

gittup commented 2 years ago

I don't think you want to ignore the gcm.cache directory completely, since that will create problems when you start having your modules depend on other modules that you create. Eg: If another .cpp file does 'import example', it will read only from the example.gcm file under gcm.cache/. You'll want to recompile the other cpp file if example.gcm changes, so I don't think using ignores here is the way to go.

You can list the .gcm files as outputs in the rules with the path structure. The main downside here is that the standard libraries all include the version numbers in the path, so unfortunately I think you have to write your rules to target a specific gcc version. Maybe there's a way around this but I haven't discovered it yet. Note the base Tupfile format doesn't have a way to do loops on things that aren't files, so if you want to iterate on standard header names, it's going to be easier to do that in a Tupfile.lua. For example, here's how you could compile multiple standard headers:

$ cat Tupfile.lua
local std = {
        'iostream',
        'algorithm',
}       
local k, v
for k, v in ipairs(std) do
        tup.rule('g++ $(CXXFLAGS) -xc++-system-header ' .. v, {'gcm.cache/usr/include/c++/11/' .. v .. '.gcm', '<std>'})
end     

The hard-coded c++/11/ is obviously the problematic part, as mentioned. Note that I've included the .gcm file as an output (eg: gcm.cache/usr/include/c++/11/iostream.gcm), as well as a group, which is <std>. The group is used here to create a base layer of files that can be used in future compilations that use them.

To now compile your example.cpp module, you can list the .o file as an output and put the .gcm file in extra_outputs (extra outputs don't show up in %o flags, but are expected to be written to by the program). The group is listed as an extra input, so any standard library .gcm that was compiled beforehand is fair game to read from. Here is an example rule adding on to the above Tupfile.lua:


$ cat Tupfile.lua
... (same as above)

local inputs = {'example.cpp'} 
inputs.extra_inputs = {'<std>'}
local outputs = {'example.o'}
outputs.extra_outputs = {'gcm.cache/example.gcm'}
tup.rule(inputs, 'g++ $(CXXFLAGS) -c %f -o %o', outputs)

Of course you're going to want to compile multiple modules. The downside with modules is that you need to know ahead of time the order that they need to be compiled in. For example, if we have another module that imports your example module:

$ cat use-example.cpp
import <iostream>;
import example;

int main()
{
    std::cout << "Adding: " << add(1, 2) << '\n';
}

We have to compile this after compiling example.cpp - it can't be done in parallel like a traditional c++ program, because it needs to read from the example.gcm file, which is only created once example.cpp is compiled. And with tup (as with any build system), you need some way to tell it to compile one before the other. With tup you have two ways to do this:

1) Explicitly list each module's dependencies (the .gcm files it will use) in the Tupfile. Effectively you'd have to duplicate the 'import' list from your .cpp file into your Tupfile. This would likely be quite tedious, though the one upside with using tup here is you'd get a "Missing input dependency" error if you are missing one. (You could be tempted to try to write something that generates a Tupfile fragment by parsing all your .cpp files, but that would increase the startup time of the build system and/or mean *.cpp are treated as inputs for reparsing Tupfiles, neither of which are optimal). Here's what an explicit listing might look like:

$ cat Tupfile.lua
local std = {
    'iostream',
    'algorithm',
}
local k, v
for k, v in ipairs(std) do
    tup.rule('g++ $(CXXFLAGS) -xc++-system-header ' .. v, {'gcm.cache/usr/include/c++/11/' .. v .. '.gcm', '<std>'})
end

function module(name, deps, is_program)
    local inputs = {name .. '.cpp'}
    inputs.extra_inputs = {'<std>'}
    local k, v
    for k, v in ipairs(deps or {}) do
        table.insert(inputs.extra_inputs, 'gcm.cache/' .. v .. '.gcm')
    end

    local outputs = {'%B.o'}
    if is_program ~= true then
        outputs.extra_outputs = {'gcm.cache/%B.gcm'}
    end

    tup.rule(inputs, 'g++ $(CXXFLAGS) -c %f -o %o', outputs)
end

function program(name, deps)
    module(name, deps, true)
    -- TODO: actually link the executable?
end

module('example')
program('use-example', {'example'})

As you can see, 'example' can be compiled without listing anything (since a dependency on is always included). But to compile 'use-example', we have to list 'example' as an input. If we add another import into use-example, we'd also have to come back here and update the Tupfile.lua, which again is not great, but at least you'd get a tup error if you forget.

2) Conceptually, think of your modules in "layers". Maybe you have a base set of modules that only depend on the standard libraries. Then you have another set of modules that can depend on the base set and the standard libraries. (And so on with more library layers...) Finally, you have your executable(s) which can depend on any of the other modules and the .o files. With this model in place, you can assign each "layer" a group name in tup. We already have <std>for the standard libraries, so your next set of modules could be called <base>or whatever you want. Here's how something like that might look like:

local groups = {
        ['std'] = {},
        ['base'] = {'std'},
        -- More layers...
        ['program'] = {'base', 'std'},
}       

function module(name, group, is_program)
        local inputs = {name .. '.cpp'}
        local k, v
        for k, v in ipairs(groups[group]) do
                inputs.extra_inputs += '<' .. v .. '>'
        end     

        local outputs = {'%B.o'}
        if is_program ~= true then
                outputs.extra_outputs = {'gcm.cache/%B.gcm'}
                outputs.extra_outputs += '<' .. group .. '>'
        end

        tup.rule(inputs, 'g++ $(CXXFLAGS) -c %f -o %o', outputs)
end     

function program(name, group)
        module(name, group, true)
        -- TODO: actually link the executable?
end     

local std = {
        'iostream',
        'algorithm',
}
local k, v
for k, v in ipairs(std) do
        -- This could probably be merged into the module() function if you make
        -- module's .cpp input optional
        tup.rule('g++ $(CXXFLAGS) -xc++-system-header ' .. v, {'gcm.cache/usr/include/c++/11/' .. v .. '.gcm', '<std>'})
end
module('example', 'base')
program('use-example', 'program') 

(Note there is a third option for determining what order to compile modules in, which is to have more of a back-and-forth between the compiler and build system to inject dependencies on-the-fly: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1602r0.pdf - I am not sure what would be required to do something similar with tup, but it almost certainly wouldn't be straightforward, if it's even possible at all).

I think the 2nd option with groups is probably the way to go. If you are able to categorize your files into module groups and layer those group dependencies, you shouldn't have to adjust your Tupfile.lua too often as you add/remove imports. You'd still have to update it if you move a module up or down the group hierarchy, or want to create a new group, but those should be more infrequent operations that adding/removing imports.

Now to change subject slightly, there's still the problem of the location of the gcm.cache directory itself. When you run these rules, it gets created in the directory that you run the compiler. So if you have a sibling directory to 'example/' that has another module, which also uses iostream, you'll have to re-compile iostream and have 'example/gcm.cache/.../iostream.gcm' and 'another/gcm.cache/.../iostream.gcm', which is not great. So you'll probably want to have a single gcm.cache directory that is shared for your whole project (I think? Let me know if this doesn't sound right to you). To do this, g++ has the -fmodule-mapper option, which is also quite complicated (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1184r2.pdf). I didn't have any luck getting a quick program to work to do this automatically (it looks like it's a client/server architecture, so it's more complex than I was hoping). But I did get it to work with a file mapping, which looks like this:

$ cat Tuprules.lua
MODULES_DIR = tup.getcwd() .. '/gcm.cache'
CXXFLAGS += '-Wall -Wextra -Wpedantic'
CXXFLAGS += '-std=c++20 -fmodules-ts'
CXXFLAGS += '-fmodule-mapper=' .. tup.getcwd() .. '/modules.txt'

$ cat example/Tupfile.lua
local groups = {
    ['std'] = {},
    ['base'] = {'std'},
    -- More layers...
    ['program'] = {'base', 'std'},
}

-- TODO: Move module(), program() and such into Tuprules.lua so it can be shared with other Tupfile.lua's
function module(name, group, is_program)
    local inputs = {name .. '.cpp'}
    local k, v
    for k, v in ipairs(groups[group]) do
        inputs.extra_inputs += '<' .. v .. '>'
    end

    local outputs = {'%B.o'}
    if is_program ~= true then
        outputs.extra_outputs = {MODULES_DIR .. '/%B.gcm'}
        outputs.extra_outputs += '<' .. group .. '>'
    end

    tup.rule(inputs, 'g++ $(CXXFLAGS) -c %f -o %o', outputs)
end

function program(name, group)
    module(name, group, true)
    -- TODO: actually link the executable?
end

local std = {
    'iostream',
    'algorithm',
}
local k, v
for k, v in ipairs(std) do
    -- This could probably be merged into the module() function if you make
    -- module's .cpp input optional
    tup.rule('g++ $(CXXFLAGS) -xc++-system-header ' .. v, {MODULES_DIR .. '/' .. v .. '.gcm', '<std>'})
end
module('example', 'base')
program('use-example', 'program')

$ cat modules.txt
$root /home/me/test-modules/gcm.cache
/usr/include/c++/11/iostream iostream.gcm
/usr/include/c++/11/algorithm algorithm.gcm
example example.gcm

(Note the $root line is the first line in modules.txt, which gives the base module directory. Otherwise, any relative paths in here are relative to the compiler invocation working directory, not relative to the modules.txt directory).

The downsides with this modules.txt file are the full path in the $root line, which has a user's home directory in it (I couldn't figure out how to make this relative to modules.txt itself, maybe there's a way). Also, while we've removed the version info (c++/11/) from the output path, it is still used as a key for the module name. So you can't have 'iostream iostream.gcm' and have it work, you have to have to c++ version number in there. Boo. Another downside is that this modules.txt file is now an input file to all compilations. Which means if you add a new module mapping, you end up recompiling the world.

Converting modules.txt into the client-server program (and using the -fmodule-mapper=|programname style argument) might alleviate all of these problems since it could dynamically convert the module name into a path (and presumably work for all future modules, so it wouldn't require updating and recompiling world as well). If you (or anyone else interested in modules) are able to get such a program working, please post it here!

Hope that helps somewhat. Let me know if any of the examples aren't working for you, I may have missed a step.

typeless commented 1 year ago

Here is a very rudimentary Tupfile I just put up. It's conceptually the 'layers' approach. I have to manually add openssl.o for the linker rule, since the {gcm} '{bin}'-group contains both .o and .gcm files. But it seems fine for now.

include_rules

srcs-y += main.cpp
srcs-y += format.cc

mods-y += openssl.cpp ## C++ wrapper for OpenSSL

CXXFLAGS += -std=c++20
CXXFLAGS += -DOPENSSL_NO_1_1_0
CXXFLAGS += -Wall -Wextra
CXXFLAGS += -g
CXXFLAGS += -Wno-deprecated-declarations
CXXFLAGS += -Wconversion
CXXFLAGS += -Iinclude

CXXFLAGS += -fmodules-ts

LDFLAGS += -lssl -lcrypto

: foreach $(mods-y) |> $(CXX) $(CXXFLAGS) -fmodules-ts -c -o %o %f |> %B.o gcm.cache/%B.gcm {gcm}
: foreach $(srcs-y) | {gcm} |> $(CXX) $(CXXFLAGS) -c -o %o %f |> %B.o {objs}
: {objs} openssl.o |> $(CXX) -o %o %f $(LDFLAGS) |> a.out