ContinuumIO / ProtoCI

Prototype of CI for conda recipes
BSD 3-Clause "New" or "Revised" License
7 stars 9 forks source link

Psteinberg/split builds #10

Closed PeterDSteinberg closed 8 years ago

PeterDSteinberg commented 8 years ago

Makes separate subparsers, one for the build of packages ('build') and one for the splitting of packages into distinct trees of a target size ('split').

Here is an example of the split action that we would run before submitting builds to anaconda build:

$ python build2.py ./ split -t 5 -s somejs.js && cat somejs.js
{"libnetcdf": ["curl", "cmake", "hdf5", "zlib"], "pysam": ["python", "cython", "cmake", "zlib"]}```

$ python build2.py ./ split -t 10 -s somejs.js && cat somejs.js
{"pysam": ["curl", "cmake", "hdf5", "zlib", "libnetcdf", "python", "cython"]}

-t is the target number of packages per group, -s is the name of a file to save the splits in a json dict

The other usage pattern is build to actually do the build (as would be called from .binstar.yml):

# to build all files in dir ./
python build2.py ./ build -buildall
# to build only the hdf5 package
python build2.py ./ build -build hdf5
# to build all the packages in a key of a json created by the split method mentioned above
# libnetcdf must be a key in somejs.js
python build2.py ./ build -json-file-key somejs.js libnetcdf
msarahan commented 8 years ago

Thanks for working on this. A couple of comments:

PeterDSteinberg commented 8 years ago

The split is aimed at making jobs that are ca. 30 to 60 minutes long at longest for greatest stability of the build workers. The idea of this split is to sort by the high level nodes (those who require the most dependency builds) and to find their successors recursively. The split command produces a json and the order of the dependencies in the list at each value is in topologically correct order of install for that tree, e.g.:

To build / test libnetcdf, first install the list of dependencies from beginning to end, then install libnetcdf:

"libnetcdf": ["curl", "cmake", "hdf5", "zlib"]

Finally smaller tree branches are added together in one job (see coalesce). This can be done by setting the -targetnum per split.

msarahan commented 8 years ago

I see. That makes good sense. I wonder if it is worthwhile to track build times for each package somewhere. Most packages are very quick, but some (Qt, for example) take 20-40 min. Knowing estimates of these might make your approach work better.

PeterDSteinberg commented 8 years ago

Yes we have the build times being logged, so after a few builds we can go through the logs where the times are printed out and put that in the meta.yaml's extra: dict. @groutr started work on that yesterday.

msarahan commented 8 years ago

Thanks!