SCons / scons

SCons - a software construction tool
http://scons.org
MIT License
2.1k stars 315 forks source link

Provide a way to automatically list all generated sources #3624

Open acmorrow opened 4 years ago

acmorrow commented 4 years ago

Describe the Feature One of the MongoDB SCons tools is the compilation database tool, which builds a compilation database for a project without building the sources. The fact that we don't build the sources is important - it takes a long time to build all the sources, but engineers often want a compilation database right away so they can use it as part of an IDE, or so they can run tooling like clang-tidy on specific files without needing to wait for a full build first.

However, often there are a small number of generated source files, like a config,h header, that are required in order to have a tool like clang-tidy work. Without them, the compiler front end can't find necessary include files.

To solve this, MongoDB has an Alias called generated-sources, to which we manually add every generated source. This allows engineers to just build the set of generated sources required to make compilation database tooling work. However, remembering to add to the Alias is inconvenient and easily overlooked, so it often breaks.

We are wondering whether it may be possible for SCons itself to identify and automatically track generated sources, and then expose them via a method similar to the FindInstalledFiles method. Such a facility would also be useful in our Ninja backend for SCons, which also needs to know about generated sources.

There are definitely some open questions: What makes something a generated "source"? How can they be identified? Would SCons be able to deterministically identify such and expose them? How does the information get scoped? Per Environment? globally?

Required information

dmoody256 commented 4 years ago

What makes something a generated "source"? How can they be identified?

I think the most generic definition of a "generated source" is any node that was a source for some other target and was also a target itself? If it was a target that means it was created/modified by the build and if it was a source it was input to something else.

But I think that is too generic for what you want and may include things you don't want like input libraries. I assume what you are trying to get at is "generated source code"? That definition may be useful for a lot of people as well. I think what you want is some type user defined scanner, which is focused on files with a certain extensions and will check that files with those extensions were a target node for some other task, as well as being a source.

Would SCons be able to deterministically identify such and expose them?

I think as SCons is walking the DAG, it can simply set a flag for each node like: self.is_target or self.is_source I would be surprised if its not doing this in some capacity already. Then it if both are true it can add them to some list, or track them all in some way the user can access later.

How does the information get scoped? Per Environment? globally?

I would say globally, the taskmaster walks the DAG, and doesn't care about environments, the nodes keep track of there environments, but the task master can mark the nodes as a source or target as it goes, and also check if the node has become both a source and a target, add that node the the "generated sources" list.

bdbaddog commented 4 years ago

Seems most likely an api like: get me all the files with these extensions which have builders or get me all files which match these extensions

Which would walk the DAG and give that info.

The other way to go about this is to provide a way to mark a node as generated (in whatever way that matters to you), and then an api to walk the DAG and return all File()'s which have that flag (or don't)..

dmoody256 commented 4 years ago

what about a combination of what you said @bdbaddog, provide a way to get generated nodes by input function, where the default is all nodes which have builders. Something like this maybe:

def GetGeneratedNodes(generated_func=(lambda node : node.has_builder())):
    nw = SCons.Node.Walker(SCons.Node.FS.get_default_fs().SConstruct_dir)
    generated = []
    node = nw.get_next()
    while node is not None:
        if generated_func(node):
            generated.append(node) 
        node = nw.get_next()
    return generated

example use case:

for node in SCons.Node.GetGeneratedNodes(
    lambda node: node.has_builder() and str(node).endswith(
        ('.h', '.hpp', '.c', '.cpp'))):
    print(str(node))
bdbaddog commented 3 years ago

After pondering this a bit, would it make sense to be able to find all targets for a given builder?

The above example is fine, if it's only one step -> {.c,.cpp,.h,.hpp}, but if it's more than one step, you'd still miss out.

And given this is about figuring out which non command line builders generate files which are compiled by external tool (ninja for example), perhaps per builder would be good?

bdbaddog commented 3 years ago

Added myself as assignee for now. Just to simplify finding this.