gittup / tup

Tup is a file-based build system.
http://gittup.org/tup/
GNU General Public License v2.0
1.17k stars 144 forks source link

Global {bin} support #85

Closed ppannuto closed 11 years ago

ppannuto commented 12 years ago

Tup's bins features are really useful, but bins only last the scope of a single Tupfile. I keep stumbling into cases where I'd like to reference global bins. I would propose a syntax extension such as:

: foreach *.c |> !cc |> {local_objs} {{global_objs}}

Where {local_objs} has local scope, but {{global_objs}} has global scope. This allows for things like this at the root:

# Final Link : {{global_objs}} |> !ld |> my_executable

From what I can tell, current examples suggest things such as using ar to package static libraries, or having a final link step more like:

# Final Link : polygons/*.o vectors/*.o |> !ld |> my_drawing_program

But I find this inelegant, as you have explicit paths in Tup rules that really don't belong. Furthermore, as the tup examples demonstrate, there are use cases where *.o breaks down.

Is there anything like this in the works? Is this a terrible idea for some reason not immediately obvious? If I wanted to start hacking this in, any direction for where to start? I understand tup's model at a high-level, but (I believe) implementing something like this would require another type of symbolic entry in the DAG, is there any kind of primer of tup's internals anywhere?

gittup commented 12 years ago

On Thu, Oct 4, 2012 at 4:03 PM, ppannuto notifications@github.com wrote:

Tup's bins features are really useful, but bins only last the scope of a single Tupfile. I keep stumbling into cases where I'd like to reference global bins. I would propose a syntax extension such as:

: foreach *.c |> !cc |> {local_objs} {{global_objs}}

Where {local_objs} has local scope, but {{global_objs}} has global scope. This allows for things like this at the root:

Final Link

: {{global_objs}} |> !ld |> my_executable

From what I can tell, current examples suggest things such as using ar to package static libraries, or having a final link step more like:

Final Link

: polygons/.o vectors/.o |> !ld |> my_drawing_program

But I find this inelegant, as you have explicit paths in Tup rules that really don't belong. Furthermore, as the tup examples demonstrate, there are use cases where *.o breaks down.

What use cases are you referring to in the examples? It's been a while since I've looked at them :)

Is there anything like this in the works?

Probably the closest thing to what you want is the groups feature, where you can essentially put multiple outputs from disparate directories into the same group. Eg: : foreach *.c |> !cc |> {local_objs}

(untested, but I think that's how it goes).

However, currently when you use a group as an input, it just means "this command can use any file from as an input", not "use 's list of outputs in %f", which I believe is what you want.

Is this a terrible idea for some reason not immediately obvious?

No, it's not terrible, but tup currently runs with the idea that command strings are static, until the directory contents or the Tupfile itself changes (at which point we re-parse the Tupfile to get the new static command strings). Tupfiles are also parsed relatively independently, so if we parse a Tupfile like:

: |> gcc %f -o %o |> program

Then tup is going to try to build a string like "gcc lib/foo.o src/bar.o -o program" and store it in the database. However with a group tag as the input, we may not yet know what objects are going to be in the group at all, since we may not have parsed the lib/Tupfile and src/Tupfile yet. And given the information in this Tupfile, there's no way to know that we need to parse those Tupfiles first. In comparison, if we have:

: lib/.o src/.o |> gcc %f -o %o |> program

Then tup halts parsing this Tupfile, and parses lib/Tupfile so it knows what lib/.o will resolve to (and then similarly for src/Tupfile and src/.o).

If I wanted to start hacking this in, any direction for where to start? I understand tup's model at a high-level, but (I believe) implementing something like this would require another type of symbolic entry in the DAG, is there any kind of primer of tup's internals anywhere?

One method I tried previously for a similar problem is to allow the parser to store things like "gcc %f -o program" in the database, and then expand %f right before the sub-process is executed in the updater. I did not continue with that approach for other reasons (it was being worked in combination with an attempt to allow sub-processes to write to arbitrary files not listed in the Tupfile), but that may work here. The problem with that branch was not related to the delayed %f expansion, but with outputs stomping over existing things in the database. I can try to dig that up if you are interested.

I should note, though, that I think it is manageable to have each directory pull in objects from its immediate subdirectories. This is how the Linux kernel Makefiles work - each directory partially links its objects (using ld -r) to create a super-object named built-in.o. So instead of a top-level Tupfile with:

: src/.o src/foo/.o src/bar/*.o |> ... |>

You have:

src/Tupfile: : foreach .c |> gcc -c ... |> : .o foo/built-in.o bar/built-in.o |> ld -r %f -o %o |> built-in.o

foo/Tupfile: : foreach .c |> gcc -c ... |> : .o |> ld -r %f -o %o |> built-in.o

top-level Tupfile: : src/built-in.o |> ... |> program

This makes each directory easily maintainable, since it is dealing only with its own contents (ie: each c file in the current directory becomes a .o file, and we pull built-in.o from each immediate sub-directory). It also works well with tup's current model, so that if you add a new sub-directory under src/, it only needs to re-parse src/new/Tupfile and src/Tupfile.

-Mike

ppannuto commented 12 years ago

On Fri, Oct 12, 2012 at 4:00 PM, gittup notifications@github.com wrote:

On Thu, Oct 4, 2012 at 4:03 PM, ppannuto notifications@github.com wrote:

Tup's bins features are really useful, but bins only last the scope of a single Tupfile. I keep stumbling into cases where I'd like to reference global bins. I would propose a syntax extension such as:

: foreach *.c |> !cc |> {local_objs} {{global_objs}}

Where {local_objs} has local scope, but {{global_objs}} has global scope. This allows for things like this at the root:

Final Link

: {{global_objs}} |> !ld |> my_executable

From what I can tell, current examples suggest things such as using ar to package static libraries, or having a final link step more like:

Final Link

: polygons/.o vectors/.o |> !ld |> my_drawing_program

But I find this inelegant, as you have explicit paths in Tup rules that really don't belong. Furthermore, as the tup examples demonstrate, there are use cases where *.o breaks down.

What use cases are you referring to in the examples? It's been a while since I've looked at them :)

That was in reference to (such as if one directory creates multiple binaries, using *.o wouldn't be correct) from the man page.

Is there anything like this in the works?

Probably the closest thing to what you want is the groups feature, where you can essentially put multiple outputs from disparate directories into the same group. Eg: : foreach *.c |> !cc |> {local_objs}

(untested, but I think that's how it goes).

However, currently when you use a group as an input, it just means "this command can use any file from as an input", not "use 's list of outputs in %f", which I believe is what you want.

Yes... but this is very useful for a different problem (more below)

Is this a terrible idea for some reason not immediately obvious?

No, it's not terrible, but tup currently runs with the idea that command strings are static, until the directory contents or the Tupfile itself changes (at which point we re-parse the Tupfile to get the new static command strings). Tupfiles are also parsed relatively independently, so if we parse a Tupfile like:

: |> gcc %f -o %o |> program

Then tup is going to try to build a string like "gcc lib/foo.o src/bar.o -o program" and store it in the database. However with a group tag as the input, we may not yet know what objects are going to be in the group at all, since we may not have parsed the lib/Tupfile and src/Tupfile yet. And given the information in this Tupfile, there's no way to know that we need to parse those Tupfiles first. In comparison, if we have:

: lib/.o src/.o |> gcc %f -o %o |> program

Then tup halts parsing this Tupfile, and parses lib/Tupfile so it knows what lib/.o will resolve to (and then similarly for src/Tupfile and src/.o).

This issue is what I was driving at by adding 'another type of symbolic entry in the DAG'. I had envisioned the output program depending on the logical global bin, which in turn would depend on lib/lib1.o and src/src1.o etc. However, with your explanation of the string database, this wouldn't work.

If I wanted to start hacking this in, any direction for where to start? I understand tup's model at a high-level, but (I believe) implementing something like this would require another type of symbolic entry in the DAG, is there any kind of primer of tup's internals anywhere?

One method I tried previously for a similar problem is to allow the parser to store things like "gcc %f -o program" in the database, and then expand %f right before the sub-process is executed in the updater. I did not continue with that approach for other reasons (it was being worked in combination with an attempt to allow sub-processes to write to arbitrary files not listed in the Tupfile), but that may work here. The problem with that branch was not related to the delayed %f expansion, but with outputs stomping over existing things in the database. I can try to dig that up if you are interested.

I should note, though, that I think it is manageable to have each directory pull in objects from its immediate subdirectories. This is how the Linux kernel Makefiles work - each directory partially links its objects (using ld -r) to create a super-object named built-in.o. So instead of a top-level Tupfile with:

: src/.o src/foo/.o src/bar/*.o |> ... |>

You have:

src/Tupfile: : foreach .c |> gcc -c ... |> : .o foo/built-in.o bar/built-in.o |> ld -r %f -o %o |> built-in.o

foo/Tupfile: : foreach .c |> gcc -c ... |> : .o |> ld -r %f -o %o |> built-in.o

top-level Tupfile: : src/built-in.o |> ... |> program

This makes each directory easily maintainable, since it is dealing only with its own contents (ie: each c file in the current directory becomes a .o file, and we pull built-in.o from each immediate sub-directory). It also works well with tup's current model, so that if you add a new sub-directory under src/, it only needs to re-parse src/new/Tupfile and src/Tupfile.

Yes, this does largely work. I still find an inelegance to making renames an unnecessarily two-step process (e.g. rename src/wrench to src/monkey-wrench, now I have to edit the src/Tupfile to point from wrench -> monkey-wrench. I think it comes down to the application domain however - sometimes it is appropriate for a root src to explicitly include everything in the final product and sometimes it makes more sense for children to dynamically declare themselves as part of the parent (this latter case is certainly less common, but represents my current problem))

Ultimately this boils down to a stylistic nit I think, and not worth pursuing immediately. s however sound very useful for a different problem we tried to use Tup for -- large LaTeX projects:

A simplified idea here:

$ tree sample_paper sample_paper ├── figs <<<<< Possibly complicated figures generated from raw data │   ├── regulators │   │   ├── regulators.dat │   │   ├── regulators.plt │   │   └── Tupfile │   ├── sample │   │   ├── am_high.log │   │   ├── am_low.log │   │   ├── am_med.log │   │   ├── solariv_am.plt │   │   └── Tupfile │   └── Tuprules.tup ├── images <<<< Any kind of image format here, get something Tex-usable │   ├── build │   │   └── Tupfile │   ├── flower.gif │   ├── placeholder.jpg │   ├── sysarchnew.png │   ├── sysarch.pdf │   └── sysarchv2.eps └── tex <<<< Actual paper sources (many .tex's omitted) ├── Tupfile <<<< The troublesome Tupfile **** ├── bib.bib ├── intro.tex ├── paper.tex └── sig-alternate-10pt.cls

The troublesome Tupfile currently looks something like:

: paper.tex |\ .tex .cls .bib ../../images/build/ \ ../../figs/regulators/regulators* \ ../../figs/sample/solariv_am* \ |> pdflatex %f; bibtex %B; pdflatex %f pdflatex %f |> paper.pdf | \ paper.aux paper.bbl paper.blg paper.ent paper.log paper.out

As the number, names, details, etc of figures grows, this starts to be pretty unusable quickly. It looks like a perfect use case for s.

Is the feature anywhere near stable / usable?

-Mike

— Reply to this email directly or view it on GitHub.

Pat Pannuto Computer Engineering University of Michigan 248.990.4548

bradjc commented 11 years ago

Extending the {bin} mechanism to have the same global directory based semantics as would be a nice solution. This would be a transparent change to existing Tupfiles, but would solve Pat's problem. This is attractive because it preserves the meaning that groups are order-only and bins are for typical inputs. Couple directory based bins with blank commands and all the existing functionality should remain. That is:

src/module1/Tupfile
: foreach *.c |> !cc %f |> %B.o {objs}
: foreach {objs} |> foo %f |>          # do whatever you want on the local bin
: foreach {objs} |> |> ../{all_objs}   # put them all in a global bin

src/Tupfile
: {all_objs} |> ld %f |> 

will work nicely.

Now as you mentioned, implementing this would be quite difficult at the moment. This would probably require either a new method of tracking commands and dependencies or some flag to force reparses of Tupfiles if the bins change. Maybe this could be incorporated into a large change in the future.

gittup commented 11 years ago

On Wed, Apr 24, 2013 at 2:05 PM, Brad Campbell notifications@github.comwrote:

Extending the {bin} mechanism to have the same global directory based semantics as would be a nice solution. This would be a transparent change to existing Tupfiles, but would solve Pat's problem. This is attractive because it preserves the meaning that groups are order-only and bins are for typical inputs. Couple directory based bins with blank commands and all the existing functionality should remain. That is:

src/module1/Tupfile : foreach *.c |> !cc %f |> %B.o {objs} : foreach {objs} |> foo %f |> # do whatever you want on the local bin : foreach {objs} |> |> ../{all_objs} # put them all in a global bin

src/Tupfile : {all_objs} |> ld %f |>

will work nicely.

Now as you mentioned, implementing this would be quite difficult at the moment. This would probably require either a new method of tracking commands and dependencies or some flag to force reparses of Tupfiles if the bins change. Maybe this could be incorporated into a large change in the future.

The "global bin" concept is currently in the different-dir2-plus-groups branch in github. Here's how your example would look (roughly) -

src/module1/Tupfile : foreach *.c |> !cc %f |> %B.o | ../

src/Tupfile : |> ld cat %<objs> |>

Please try it out and let me know if it meets your expectations.

-Mike

bradjc commented 11 years ago

I just tried this and it worked well for me. I just have a couple questions/comments:

gittup commented 11 years ago

On Tue, Jun 11, 2013 at 1:00 AM, Brad Campbell notifications@github.comwrote:

I just tried this and it worked well for me. I just have a couple questions/comments:

  • Why is cat %<group> needed? I can't tell what that is adding and why % is not enough.

For now I just implemented it as a resource file for compatibility with MSVC (which was another feature request). There, it is just 'cl @%'. We should be able to do it automatically at the command-line as well so the cat isn't needed, but we still need to support resource files. Any thoughts on syntax to distinguish between the two? For simplicity in the code, they should probably both begin with '%' so the same function handles expansion.

  • It is unfortunate that cat %<group> shows up in the command output that is printed to standard out. So instead of ld main.o obj1.o obj2.othe command looks like ld cat %<objs>. I think I know why this is, but it makes it hard to debug what is going on.

Yeah, that is a good point. Putting them directly in the command-line shouldn't be too hard - the only annoying part is sizing or resizing the command string to fit the filenames.

Thanks for your feedback!

-Mike

bradjc commented 11 years ago

For now I just implemented it as a resource file for compatibility with MSVC (which was another feature request). There, it is just 'cl @%'. We should be able to do it automatically at the command-line as well so the cat isn't needed, but we still need to support resource files. Any thoughts on syntax to distinguish between the two? For simplicity in the code, they should probably both begin with '%' so the same function handles expansion.

Ah I see. %<group> expands to a filename which explains the cat. When I wasn't looking at your examples I intuitively used %f:

<objs> |> !ld %f -o %o |> app

So I'd like to see %f continue to expand to the names of the input files. As for this resource file shenanigans, I'm not sure how extensive they will be used. It seems like picking another letter for the % might be the simplest solution but I'm not sure what they are for.

gittup commented 11 years ago

On Wed, Jun 12, 2013 at 1:44 PM, Brad Campbell notifications@github.comwrote:

For now I just implemented it as a resource file for compatibility with MSVC (which was another feature request). There, it is just 'cl @%'. We should be able to do it automatically at the command-line as well so the cat isn't needed, but we still need to support resource files. Any thoughts on syntax to distinguish between the two? For simplicity in the code, they should probably both begin with '%' so the same function handles expansion.

Ah I see. % expands to a filename which explains the cat. When I wasn't looking at your examples I intuitively used %f:

|> !ld %f -o %o |> app So I'd like to see %f continue to expand to the names of the input files. As for this resource file shenanigans, I'm not sure how extensive they will be used. It seems like picking another letter for the % might be the simplest solution but I'm not sure what they are for. I don't think %f can be expanded to handle this use-case without significantly changing tup. The % syntax is deferred until runtime - at that point, the distinction between normal & order-only inputs is lost, so if %f were deferred as well, the distinction would have to be stored in the database somehow.

I think we can do something so that the group is expanded in-line, but I believe it will still need to be separate from the other %-flags that are expanded at parsing time.

-Mike

bradjc commented 11 years ago

I don't think %f can be expanded to handle this use-case without significantly changing tup. The % syntax is deferred until runtime - at that point, the distinction between normal & order-only inputs is lost, so if %f were deferred as well, the distinction would have to be stored in the database somehow.

Ok I expected this to be a problem from the beginning and I see how you worked around it. In that case perhaps %<group name> is the most logical. Although that might have problems with paths to groups; I'm not sure how/if you are handling those now.

src/lib:
: *.c |> !cc |> %B.o ../<objs>

src/module:
: *.c |> !cc |> %B.o ../<objs>
: ../<objs> |> !ld %../<objs> |> a.out  # hard to parse?
  OR
: ../<objs> |> !ld %<../objs> |> a.out  # confusing/changing syntax

I also had a thought about the resource files: perhaps a special symbol (maybe "$") denotes the filename expansion of the %-flag, ie:

# equivalent lines
: *.c |> !cc `cat %$f` |> %B.o
: *.c |> !cc %f |> %B.o
gittup commented 11 years ago

On Thu, Jun 13, 2013 at 4:14 PM, Brad Campbell notifications@github.comwrote:

I don't think %f can be expanded to handle this use-case without significantly changing tup. The % syntax is deferred until runtime - at that point, the distinction between normal & order-only inputs is lost, so if %f were deferred as well, the distinction would have to be stored in the database somehow.

Ok I expected this to be a problem from the beginning and I see how you worked around it. In that case perhaps % is the most logical. Although that might have problems with paths to groups; I'm not sure how/if you are handling those now.

src/lib: : *.c |> !cc |> %B.o ../

src/module: : *.c |> !cc |> %B.o ../ : ../ |> !ld %../ |> a.out # hard to parse? OR : ../ |> !ld %<../objs> |> a.out # confusing/changing syntax

The path only applies to the input side of things. Inside the command string, it just matches the name of the group to any of the inputs with the same name. So:

: ../ |> !ld % |> a.out

This would hopefully be made clear with some documentation :)

I believe if you had multiple groups with the same names and different paths, it would expand all of them:

: foo/ bar/ |> !ld % |> a.out

The % would contain both foo/ and bar/. I'm not sure if this is a valid use-case or not (I don't explicitly test for it), but that's the way the name-matching works.

I also had a thought about the resource files: perhaps a special symbol (maybe "$") denotes the filename expansion of the %-flag, ie:

equivalent lines

: .c |> !cc cat %$f |> %B.o : .c |> !cc %f |> %B.o

Yeah, I think I'll likely change the syntax for resource files. Just a bit confusing with all the different symbols floating around...

-Mike

bradjc commented 11 years ago

The path only applies to the input side of things. Inside the command string, it just matches the name of the group to any of the inputs with the same name.

Awesome. If you add to the different-dir2-plus-groups let me know and I'll keep testing it on my project.

gittup commented 11 years ago

On Wed, Jun 12, 2013 at 1:44 PM, Brad Campbell notifications@github.comwrote:

For now I just implemented it as a resource file for compatibility with MSVC (which was another feature request). There, it is just 'cl @%'. We should be able to do it automatically at the command-line as well so the cat isn't needed, but we still need to support resource files. Any thoughts on syntax to distinguish between the two? For simplicity in the code, they should probably both begin with '%' so the same function handles expansion.

Ah I see. % expands to a filename which explains the cat. When I wasn't looking at your examples I intuitively used %f:

|> !ld %f -o %o |> app So I'd like to see %f continue to expand to the names of the input files. As for this resource file shenanigans, I'm not sure how extensive they will be used. It seems like picking another letter for the % might be the simplest solution but I'm not sure what they are for. I've changed the syntax/behavior slightly. Now % expands inline to be the filenames, whereas %.res expands to a resource file (the original behavior). So you could do:

: |> gcc % -o %o |> prog

or in VS with a resource file:

: |> cl @%.res /Fe%o |> prog.exe

There are a few other bug fixes on the branch as well. I think it's getting pretty close, but let me know if there are other things missing or cases where it doesn't seem to work (or more bugs :)

-Mike

bradjc commented 11 years ago

I updated to just using %<objs> and everything still works as expected. Are you going to try to find a way to get the output command string to expand %<objs> to the files it matches?

gittup commented 11 years ago

On Fri, Jun 21, 2013 at 1:58 PM, Brad Campbell notifications@github.comwrote:

I updated to just using % and everything still works as expected. Are you going to try to find a way to get the output command string to expand % to the files it matches?

— Reply to this email directly or view it on GitHubhttps://github.com/gittup/tup/issues/85#issuecomment-19831007 .

Oh, I forgot to mention that. It still just prints % in the command-line by default (as you've noticed), and it's a bit tricky to change that due to the way tup prints out nodes. I did however make it print out the expanded name beneath the command banner if the command fails (for debugging), or if you run with --verbose. In either of these cases you would see:

1) ld % tup: Expanded command string: ld src/foo.o src/bar.o

If that isn't sufficient I can probably take another look at it - I just didn't want to have to re-work print_tup_entry() (which I think is what would be needed to get the expanded string in the banner itself).

-Mike

bradjc commented 11 years ago

I did however make it print out the expanded name beneath the command banner if the command fails (for debugging), or if you run with --verbose.

Ok neat. That is probably actually a better solution as the list of files in the group can make for an unwieldy command to display.

ppannuto commented 11 years ago

I finally got around to installing this in one of my bigger projects, works great. I was happily surprised / impressed to see some pretty complicated syntax work out-of-the-box:

: <objs> <@(PLATFORM)_objs> |> !ld %<objs> %<@(PLATFORM)_objs> |> exe

For what it's worth, the following ran, but didn't do what I expected, though honestly once I gave it a second look it really shouldn't have surprised me:

: <objs> @(PLATFORM)/<objs> |> !ld %<objs> |> exe

It's a minor nit of the cognitive dissonance introduced by the fact that groups of the same name in different paths are logically different groups, but once they're a part of the rule the path information is lost. In this case, only the objects in the first <objs> entry were passed to the linker. Perhaps tup should issue a warning / error when it encounters this, but that may not be worth the effort.

Thanks for making this feature work, it's super useful :+1:

gittup commented 11 years ago

On Tue, Aug 27, 2013 at 2:57 AM, Pat Pannuto notifications@github.comwrote:

I finally got around to installing this in one of my bigger projects, works great. I was happily surprised / impressed to see some pretty complicated syntax work out-of-the-box:

: <@(PLATFORM)_objs> |> !ld % %<@(PLATFORM)_objs> |> exe

For what it's worth, the following ran, but didn't do what I expected, though honestly once I gave it a second look it really shouldn't have surprised me:

: @(PLATFORM)/ |> !ld % |> exe

It's a minor nit of the cognitive dissonance introduced by the fact that groups of the same name in different paths are logically different groups, but once they're a part of the rule the path information is lost. In this case, only the objects in the first entry were passed to the linker. Perhaps tup should issue a warning / error when it encounters this, but that may not be worth the effort.

Are you sure it only grabs the first ? From the code it looks like it should grab all groups named . Here's a test I tried:

. ./tup.sh

tmkdir foo cat > foo/Tupfile << HERE : |> touch %o |> foo.txt | HERE cat > Tupfile << HERE : |> touch %o |> main.txt | : foo/ |> echo % |> HERE update

eotup

Both foo/foo.txt and main.txt are listed in the echo command. Maybe the @(PLATFORM)/ group is in a different directory from the platform-specific group that you are writing to? You could try to do a 'tup graph .' from the top and grep for objs to see how many actual groups you have of that name.

-Mike