gittup / tup

Tup is a file-based build system.
http://gittup.org/tup/
GNU General Public License v2.0
1.17k stars 144 forks source link

calculate compilation time prediction based on file sizes #188

Open leaf-node opened 10 years ago

leaf-node commented 10 years ago

when compiling the tup repo using tup, i noticed that right before sqlite3 gets compiled, the remaining time estimate is quite low, at about 1 second. however, the sqlite3.c file takes about 15 seconds to compile.

given that the file is 5.0 MB, it would make sense to scale the remaning time measurement based on the the KB of files not yet compiled compared to the total KB of the files being compiled with this execution of tup.

of course, this info could be gathered on the fly while the first files are being compiled, etc.

gittup commented 10 years ago

On Sun, Jun 29, 2014 at 6:41 PM, Andrew Engelbrecht < notifications@github.com> wrote:

when compiling the tup repo using tup, i noticed that right before sqlite3 gets compiled, the remaining time estimate is quite low, at about 1 second. however, the sqlite3.c file takes about 15 seconds to compile.

given that the file is 5.0 MB, it would make sense to scale the remaning time measurement based on the the KB of files not yet compiled compared to the total KB of the files being compiled with this execution of tup.

of course, this info could be gathered on the fly while the first files are being compiled, etc.

While this might work somewhat in the tup tree for the specific instance you cite, I think it is unlikely to work in general for a few reasons:

A) Tup doesn't have any domain-specific knowledge, so it doesn't know that one command is a "compiler" and should use the file-size as a rough approximation of build time. We'd have to determine if a particular command is a call to a compiler, and if so decide which file to size-check. (ie: if we run python instead of gcc, it doesn't make sense to check the size of the .py file to approximate the runtime).

B) File-size of the main .c/.cc file is likely only roughly correlated to the actual compile time. You would also need to factor in the size of included files, and preprocessing files before you actually compile them just to get a time estimate would be a performance hit.

I think you would have to spend a lot of time tweaking this estimation logic to really get something better. Also note that tup uses two different job times for its estimates:

1) The first time a job is run, it uses the average job time (ie: if compiling 10 files, the estimate for the 10th will be the average runtime of the first 9, or something like that.)

2) The 2nd and all future times the job is run, it uses the previous runtime as the estimate (since adding or removing a few lines from a .c file is unlikely to significantly change the compilation runtime).

Factoring in the file-size only improves the first case for compilation rules, which again is only used the first time you compile. I don't think it would help much in the 2nd case when we already have an accurate runtime estimate, and it also doesn't help in general for non-compilation commands.

-Mike