anaconda-graveyard / conda-concourse-ci

Conda-driven Concourse CI for package building
BSD 3-Clause "New" or "Revised" License
13 stars 29 forks source link

Use a pool of locks to limit the number of concurrent build on each platform #128

Closed jjhelmus closed 5 years ago

jjhelmus commented 5 years ago

Use a pool of locks backed by a Git repository to limit the number of concurrent build tasks run on each build platform. Once all locks are acquired additional jobs will wait until a lock in the appropriate pool becomes available and will then start. By default a delay of 10 seconds is used before retrying to acquire a lock although this can be controlled by the retry_delay parameter of the lock resource.

Locks are acquired immediately before the build step and released regardless of the success of the build step.

Locks are only used for the one-off sub-command when the --use_lock_pool argument is included.

This implementation requires the following additions to the concourse configuration files:

Note that this does not route jobs to a particular workers. This only limits the number of concurrent builds jobs on each platform.

soapy1 commented 5 years ago

This looks really awesome! How many concurrent builds can happen on a given worker?

msarahan commented 5 years ago

This is critical foundation, but I don't think this is coupled with a per-worker job limit right now.

Note that this does not route jobs to a particular workers. This only limits the number of concurrent builds jobs on each platform.

We need to tie a given lock to a worker somehow. I'm not sure how easy that might be, since it's tied into concourse's dispatch mechanisms.

jjhelmus commented 5 years ago

Unfortunately the concurrency limit is not per worker but rather per "type" of worker. So you can limit the number of concurrent jobs on all windows workers to say 6 jobs but it is possible that all of these jobs will be scheduled on a single worker rather than spread across all the available workers. This is not ideal but much better than the current situation which allows unlimited jobs to be started on arbitrary workers.

The concurrently per type of worker is set by the number of lock files available in the associated pool in the backing Git repository. For example in the repository I have been testing with the concurrency for the win and linux pools is 3 (there are 3 files in the sub-directories of the linux folder) and 2 for the linux_ppc64le and osx pools. Currently all the lock files in this test repository are unclaimed but these files will move to the claimed sub-directory when jobs acquire a lock and back to the unclaimed sub-directory when they are released. Lock files can be added or removed from the associated pool manually using standard git commands or an automated method could be implemented that added and removed lock files as workers become available or leave.

I would really like to have a method where the lock files could be used to direct a particular build to an available worker but I can not figure how to do this without selecting a particular worker dynamically at run time. It would be possible to select a particular worker when the concourse plan is generated and use a tags step modifier to direct the build task to that worker. The issue with the setup is that if a worker becomes unavailable or is broken all jobs directed to that worker cannot be completed.

jjhelmus commented 5 years ago

We need to tie a given lock to a worker somehow

Dynamic build plans would allow a worker tag to be loaded from the lock file but this feature does not yet exist in concourse, see concourse/concourse#684.

mingwandroid commented 5 years ago

Super cool!