Closed rrnewton closed 9 years ago
Currently there is no explicit support for multiple invocations of stack. I can think of two different approaches we can take:
I can imagine (2) becoming very frustrating for users, whereas (1) won't prevent some breakage from occurring. Is there a more invasive (3) that you were thinking of?
Would it make sense to lock a whole snapshot, (e.g., only one stack invocation can be operating in ~/.stack/snapshots/x86_64-osx/lts-2.17/7.8.4/
at a time)? The "second" invocation could print a warning to say it's waiting for the lock and then just wait until the original invocation is done before continuing. If stack doesn't have to write the snapshot, then no locking happens.
And we can just say behaviour for running two stacks in the same project is undefined.
Clarification: I'm assuming it's safe for stack to read from a snapshot that another invocation is writing to, so I'm talking about a shared (non-exclusive) lock.
@manny-fp I don't know of any locking model that allows readers to read while the data is in flux. RW/shared locks allow multiple readers but writers are always exclusive, IIRC.
Regardless, my gut sense is that package-level locks would be best. That would allow two diverging instances to eventually build different packages in the same snapshot. But I'm not sure if that's easy/possible with how it works.
FWIW, I've had great success with the filelock
package on Windows (it's cross-platform). It uses native Windows APIs to avoid the need for polling (like some alternatives do) and the lock is always unlocked on process death. I can't say how well it works on other systems.
I imagine that you'd protect any write operation with a file lock.
I guess my main frustration is with command line tools (like cabal) that simply don't explain to users what their concurrent behavior is. Thus users can get into the bad habit of using them concurrently (in scripts, in CI, etc) and it "seems to work", but they're actually doing something incorrect which will fail nondeterministically. For example, is "cabal update" transactional? I don't know. We have a CI job that runs cabal update
periodically, but it doesn't ensure that cabal installs (in sandboxes) aren't happening on the same machine.
So coming from that background, I would love it if tools which don't support concurrency would just grab a global lock (@snoyberg's option 2) and bail out saying "you can't do this".
Improving on that, I was thinking of exactly what @manny-fp said about locking the (shared) snapshots. Ideally this would use a RW lock so multiple readers can run concurrently. But @snoyberg suggests that there would still be "some breakage". What's an example of that breakage?
I guess one question is whether a set of concurrent stack install
operations can have serializable semantics. I need to learn more about the package db data structures on disk, but it seems like this may in fact be possible? In particular, stack
doesn't expose a delete operation, so all operations on shared snapshots are monotonic and should commute, right? (That is, the snapshot monotonically gets closer to e.g. "all LTS-2.19 installed".) So even if the locking is "medium grained" -- around each snapshot, but grabbing the lock multiple times, for each package install, not once for the entire stack
execution -- then linearizability should still be possible. That is, if one stack instance installs {A,B,C} in a snapshot and another installs {A,X,Y}, any series of lock acquisitions and order of installs by the two instances, should result in the same final state on disk. And that final state should be equivalent to if each stack instance held the lock during its entire execution, forcing one to run completely before the other.
Things are far uglier than they seem at first, which is why anything except "enforce a single stack build can occur at once" will likely break. For example: building the exact same package with the exact same dependencies twice may result in different package IDs, and then when registering Cabal will implicitly unregister whatever already exists, causing previous install plans to break. In other words: if stack process A comes up with a build plan based on state S1, and process B comes up with another build plan based on S1, process A takes a lock and builds, and then process B takes a lock and builds, process B may end up breaking things that process A did.
Ah yes, the (terribly embarrassing) nondeterministic, irreproducible builds...
Ok, that would seem to argue for coarser grained locking. Each stack process could:
Any problems with that approach?
(There are ways to make it work better under contention -- like, using the same principle as "elimination trees", blocked processes could communicate the packages they want installed to the privileged process that holds the write lock -- but the simple approach should be good enough.)
Ouch...I didn't realize package hashes were not 100% deterministic (at least, at this level). I'm glad this is being dealt with: anything is better than brokenness.
@rrnewton I don't see a problem with that approach. I'd recommend we start off with this kind of locking as an optional feature, and decide later if it should be made the default. Does that make sense? And do you have enough information to take a stab at it?
Well, I'll probably end up asking a ton of questions somewhere ;-) (here or mailing list). To start, is there a phase distinction between installs into snapshot/shared vs. local?
I'll start by testing out the existing filelock packages, posix-filelock
seemed broken right off the bat (let me get TWO write locks at the same time.. hmm).
@rrnewton I can vouch for filelock
on Windows.
filelock
seems to work fine on Mac OS with some cursory testing.
On the filelock branch on my fork I started marking some places where the locks could happen.
It looks like this may be really easy if the modifications only occur in the build
function in Build.hs
.
How much finer grained it could go, without substantial refactoring, I'm not yet sure. (Reading top-down it's a little hard to tell where all the file system effects are, because lots of code lives in the IO monad here.)
First, on the reader lock, there are a whole bunch of places that call taggedDecodeOrLoad
and read data. It's probably better to lock at least the whole loadSourceMap
call. If you look at the guts of build
below, I'm not sure whether the actions between loadSourceMap
and preFetch
perform any reads (or writes) to disk.
menv <- getMinimalEnvOverride
-- FILELOCK: a coarse-grained option would be to lock around this whole action:
(mbp, locals, extraToBuild, sourceMap) <- loadSourceMap bopts
-- Set local files, necessary for file watching
stackYaml <- asks $ bcStackYaml . getBuildConfig
liftIO $ setLocalFiles
$ Set.insert stackYaml
$ Set.unions
$ map lpFiles locals
(installedMap, locallyRegistered) <-
getInstalled menv
GetInstalledOpts
{ getInstalledProfiling = profiling
, getInstalledHaddock = shouldHaddockDeps bopts }
sourceMap
baseConfigOpts <- mkBaseConfigOpts bopts
plan <- withLoadPackage menv $ \loadPackage ->
constructPlan mbp baseConfigOpts locals extraToBuild locallyRegistered loadPackage sourceMap installedMap
-- FILELOCK: does prefetch store in shared space?
when (boptsPreFetch bopts) $
preFetch plan
-- FILELOCK: here we could release the reader lock, and test if we need the writer lock.
if boptsDryrun bopts
then printPlan (boptsFinalAction bopts) plan
else executePlan menv bopts baseConfigOpts locals sourceMap plan
Currently executePlan
interleaves local and snapshot installs, so there doesn't seem to be a good way to "finish all snapshot installs, then release lock". I was hoping it would be possible to unlock during most of the local work.
Further, the caching scheme would seem to interfere with our attempt to use reader/writer locks. taggedDecodeOrLoad
in turn calls taggedEncodeFile
, which write to disk and is non-atomic. It could be made atomic by writing into a temporary file and moving it into place, but that would require that all the cached stuff (minibuildplans, etc) has deterministic contents and that races between concurrent processes are ok.
So.... the short term thing solution looks to be to just lock (write/exclusive) around the entire build action. But the other actions would need to be considered as well. Certainly "update" and "upgrade" write to shared areas of disk.
Update: this commit contains a simple version as a starting point.
It seems to work. I've been testing it by concurrently performing stack actions, e.g. (stack foo &); stack foo
and making sure things work ok. (And tests pass of course.)
Michael, how would you like it parameterized? It doesn't seem worth a command line flag if it will ultimately be always on. (That is, I don't think it's a desired feature to allow unsafe concurrent commands with dubious properties.) Maybe a config option in the global yaml file?
Yes, I'm not sure stack --allow-breakage
would be an attractive option, nor would stack --dont-explode
solicit understanding from end users. :wink:
Moving discussion over to the pull request (#670) and closing this issue.
@rrnewton Always great to meet a fellow Hoosier. :)
I've searched for this with strings like "haskell stack tool" "concurrent invocation", but haven't found anything yet mentioning it (and instead get into all the search results regarding parallel programming in Haskell).
My understanding is that the current behavior of cabal is currently undefined for multiple processes invoked at the same time (no lock files, see haskell/cabal#2576). But it would be nice if we could fix this at the stack level.
The scenario is that a continuous integration system launches many
stack build
commands on the same user account. With cabal sandboxes, this is ok, because they don't modify any global package dbs. However, with stack it seems like it may be unsafe, because two different projects using the same snapshot can both attempt to install into the user-wide snapshot database.Nix, for example, uses lock files on disk to resolve races between concurrent invocations. With stack, would it be sufficient to simply put a lock around each snapshot package db?