berkus / muddle

Automatically exported from code.google.com/p/muddle
0 stars 0 forks source link

Allow muddle to build things in parallel #259

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What we want is to be able to do "muddle -j 4" to mean "build using 4 
subprocesses/threads/CPUs", just as one can with GNU make.

This will involve some work on the muddle infratructure...

There are sub-proposals, as follows:

1. Make it possible to run multiple instances of muddle in the same directory.

   This is, in some sense, the "lazy" solution to the problem, as "muddle -j 4"
   would then be able to just spawn 4 shells running "muddle" without the "-j"
   switch.

   To do this, we probably need to make muddle save its state in a database
   (sqlite comes to mind, since it handles locking well, and is already in
   the Python standard library). This will also (should also) allow caching
   dependency information (NB: need to recalculate if any of the build description
   source files change - not the way "weld" doesn't generate .pyc files when it
   runs its .py files), which would hopefully (a) speed up dependency usage
   by reducing recalculation each time muddle is run, (b) allow us to simply
   ask "how many other packages will building this label make available next?"
   and (c) allow more sophisticated "muddle query" equivalents.

   This is one of the key "muddle 3" concepts, so is something we've wanted to
   do for a while. This issue seems like a sensible reason for doing it.

   Clearly we would want to find all the labels that can currently be built.
   I think we would also want to note, for each, how many other labels will
   be made buildable (or "closer to being buildable") if this label is built.
   The labels with a higher score for that would be the ones we would want to
   build earliest.

   If we've got a database, it may also be worth noting how long it took to
   build a label in the past, and feed that into our planning (this may be a bit
   too sophisticated, though).

2. Whilst the default should continue to be writing all the muddle output to the
   (same) standard output, doing multiple builds can cause this to be very
   confusing. I'd thus propose adding a '-logto' switch, which names a directory
   to which the logs for individual label builds will be written (named per the
   label and/or tag).

   This would also satisfy the grumble we get that muddle is too verbose, because
   it shows all the output at the terminal - this switch would avoid that cleanly.

   Doing this may mean moving to use of the Python logging module.

Original issue reported on code.google.com by t...@kynesim.co.uk on 21 May 2014 at 1:20

GoogleCodeExporter commented 9 years ago
And remember, multiprocessing may be my friend (for Pool handling)

Original comment by t...@kynesim.co.uk on 21 May 2014 at 2:59