Closed bitemyapp closed 7 years ago
I was able to reproduce both locally and on Homebrew's CI. (We use GHC 8.0.1.)
Maybe the title could be a little bit more informative. This is only a problem on macOS 10.12 Sierra as far as I can tell.
@cartazio any ideas how to fix the GHC problem with Sierra? macOS 10.12 has gone GM and Stack is dead in the water.
Is it stack specific or ghc 8.0 generally? Is there a minimal repro?
Since @bitemyapp's top post reports this for GHC 7.10.3, I suppose it's not specific to GHC 8.0.
It looks like it's related to split objects or sections?
It would help if there was a simpler application that reported this issue
@cartazio there's doubt that it's about split-sections: https://ghc.haskell.org/trac/ghc/ticket/12479#comment:4
@cartazio I think the minimal reproducer is brew instal -s stack
at this point, but it looks like it really doesn't get very far
Building stack-1.1.2...
Preprocessing library stack-1.1.2...
[ 1 of 87] Compiling Stack.FileWatch ( src/Stack/FileWatch.hs, dist/dist-sandbox-68bf8d9c/build/Stack/FileWatch.o )
[ 2 of 87] Compiling System.Process.PagerEditor ( src/System/Process/PagerEditor.hs, dist/dist-sandbox-68bf8d9c/build/System/Process/PagerEditor.o )
[ 3 of 87] Compiling System.Process.Log ( src/System/Process/Log.hs, dist/dist-sandbox-68bf8d9c/build/System/Process/Log.o )
[ 4 of 87] Compiling System.Process.Read ( src/System/Process/Read.hs, dist/dist-sandbox-68bf8d9c/build/System/Process/Read.o )
ghc: panic! (the 'impossible' happened)
(GHC version 8.0.1 for x86_64-apple-darwin):
Loading temp shared object failed: dlopen(/var/folders/4d/rttdp_9d2s7f36zplgwnqrgr0000gn/T/ghc94633_0/libghc_44.dylib, 5): no suitable image found. Did find:
/var/folders/4d/rttdp_9d2s7f36zplgwnqrgr0000gn/T/ghc94633_0/libghc_44.dylib: malformed mach-o: load commands size (40192) > 32768
@borsboom any idea what the problem here is? The Sierra GM has escaped into the wild!
minimal code reproducer, stack is too much code
@ilovezfs No idea. It has to be a GHC bug (panics always are) but would be nice to find a workaround. I don't have an Apple developer account, so I don't have any access to Sierra betas. Once it's released I can upgrade my spare mac Mini and try to reproduce.
@borsboom The public beta is already on GM so you don't need to have a developer subscription to have access to pretty much the final version of 10.12.
If anyone else has some time, I'd probably start trying to make a smaller reproduction by extracting the failing System.Process.Read module to a separate project and minimizing the dependencies. It's an isolated module so that should be pretty easy. If it still fails, just start taking out pieces of code until you find what triggers the panic (if it doesn't fail... well then it's more complicated I guess).
@bitemyapp: where did you get your GHC? Was it a sandboxed one installed by Stack?
Also, from one of the GHC reports, looks like this isn't specific to Stack. yesod-auth also has a similar panic:
[ 4 of 11] Compiling Yesod.Auth ( Yesod/Auth.hs, .stack-work/dist/x86_64-osx/Cabal-1.22.4.0/build/Yesod/Auth.o )
ghc: panic! (the 'impossible' happened)
(GHC version 7.10.2 for x86_64-apple-darwin):
Loading temp shared object failed: dlopen(/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/ghc64990_0/libghc_21.dylib, 5): no suitable image found. Did find:
/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/ghc64990_0/libghc_21.dylib: malformed mach-o: load commands size (34176) > 32768
[ callen@cumae ~/work/codex travisci-build-matrix-stack ✔ ]
$ stack exec -- ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.3
[ callen@cumae ~/work/codex travisci-build-matrix-stack ✔ ]
$ stack exec -- which ghc
/Users/callen/.stack/programs/x86_64-osx/ghc-7.10.3/bin/ghc
I did say it was a GHC bug in the title, just trying to keep y'all apprised.
If you can do a small single Haskell module repro then I and or other ghc dev team folk can dig into resolving it. But without a clear minimal repro that 8.0 can trigger with a small module and clear deps, hard to do.
The moment it's an issue that can be demonstrated In a self contained way, it'll be a ghc bug that gets full focus by applicable volunteers. Ghc contributors don't have the bandwidth to boil down a repro from a really large application
On Thursday, September 15, 2016, Emanuel Borsboom notifications@github.com wrote:
If anyone else has some time, I'd probably start trying to make a smaller reproduction by extracting the failing System.Process.Read module to a separate project and minimizing the dependencies. It's an isolated module so that should be pretty easy. If it still fails, just start taking out pieces of code until you find what triggers the panic (if it doesn't fail... well then it's more complicated I guess).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247437655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwlfNd2gRtHcsG-i2fqj47rYNnKgVks5qqaTKgaJpZM4J546w .
Also if the bug is in 8.0 and someone can cook up a repro I can see about sorting out a fix and making sure it's in an 8.0 bug fix. But I can't do that until there's a small self contained small code base that has the problem
On Friday, September 16, 2016, Carter Schonwald carter.schonwald@gmail.com wrote:
If you can do a small single Haskell module repro then I and or other ghc dev team folk can dig into resolving it. But without a clear minimal repro that 8.0 can trigger with a small module and clear deps, hard to do.
The moment it's an issue that can be demonstrated In a self contained way, it'll be a ghc bug that gets full focus by applicable volunteers. Ghc contributors don't have the bandwidth to boil down a repro from a really large application
On Thursday, September 15, 2016, Emanuel Borsboom < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
If anyone else has some time, I'd probably start trying to make a smaller reproduction by extracting the failing System.Process.Read module to a separate project and minimizing the dependencies. It's an isolated module so that should be pretty easy. If it still fails, just start taking out pieces of code until you find what triggers the panic (if it doesn't fail... well then it's more complicated I guess).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247437655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwlfNd2gRtHcsG-i2fqj47rYNnKgVks5qqaTKgaJpZM4J546w .
Looks like misty has provided some debugging data upstream to ghc. I'll talk with bengamari. His current theory is that somehow the offending builds have split sections enabled. And thus are drowning the linker
On Friday, September 16, 2016, Carter Schonwald carter.schonwald@gmail.com wrote:
Also if the bug is in 8.0 and someone can cook up a repro I can see about sorting out a fix and making sure it's in an 8.0 bug fix. But I can't do that until there's a small self contained small code base that has the problem
On Friday, September 16, 2016, Carter Schonwald < carter.schonwald@gmail.com javascript:_e(%7B%7D,'cvml','carter.schonwald@gmail.com');> wrote:
If you can do a small single Haskell module repro then I and or other ghc dev team folk can dig into resolving it. But without a clear minimal repro that 8.0 can trigger with a small module and clear deps, hard to do.
The moment it's an issue that can be demonstrated In a self contained way, it'll be a ghc bug that gets full focus by applicable volunteers. Ghc contributors don't have the bandwidth to boil down a repro from a really large application
On Thursday, September 15, 2016, Emanuel Borsboom < notifications@github.com> wrote:
If anyone else has some time, I'd probably start trying to make a smaller reproduction by extracting the failing System.Process.Read module to a separate project and minimizing the dependencies. It's an isolated module so that should be pretty easy. If it still fails, just start taking out pieces of code until you find what triggers the panic (if it doesn't fail... well then it's more complicated I guess).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247437655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwlfNd2gRtHcsG-i2fqj47rYNnKgVks5qqaTKgaJpZM4J546w .
His current theory is that somehow the offending builds have split sections enabled.
which would be quite weird since as misty also pointed out it seems to be a non-default option that we didn't pass in, but that doesn't mean it's impossible it built that way anyway for some reason ...
I'm pretty sure split objects is the default for the libraries. Could you try disabling that? I could be misremembering. Also I'm totally weak at linker stuff
On Friday, September 16, 2016, ilovezfs notifications@github.com wrote:
His current theory is that somehow the offending builds have split sections enabled.
which would be quite weird since as misty also pointed out it seems to be a non-default option that we didn't pass in, but that doesn't mean it's impossible it built that way anyway for some reason ...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247516337, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwhcph23yUUDuzVrrMcW5uam3W4M0ks5qqhzmgaJpZM4J546w .
--disable-split-objs
and --disable-split-sections
?
Could be / should be those
On Friday, September 16, 2016, ilovezfs notifications@github.com wrote:
--disable-split-objs and --disable-split-sections?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247520287, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwjyBGkFtBJ8zvJI5KsIsa7GtGthcks5qqia7gaJpZM4J546w .
It's important to note that some builds of ghc base and boot libs may also have those flags enabled.
In which case this could be a regression in linker on the OS X release. Eg is it a format limit or a hard coded "safety limit" in the Sierra linker? Are they still using gnu linker or move to llvm? In latter case is there some hard coded sanity limits?
Probably easy to test by unpacking the Sierra cli tools and using them on a previous os release.
On Friday, September 16, 2016, Carter Schonwald carter.schonwald@gmail.com wrote:
Could be / should be those
On Friday, September 16, 2016, ilovezfs <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
--disable-split-objs and --disable-split-sections?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247520287, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwjyBGkFtBJ8zvJI5KsIsa7GtGthcks5qqia7gaJpZM4J546w .
Looks like it's because they moved to llvms tool lld, and it hard coded a limit on number of sections it'll accept. So the near term mitigation would be to disable split objects on a ghc Mac build , but the Better medium / long term is A) we all fire radars on this B) we make sure that upstream lld is fixed.
Me and other ghc devs are collaborating on IRC right now to confirm that we've isolated the problem. Affected users could also unpack a pre Sierra cli tools and put dl in their path as an alternative mitigation (this is me speculating on an approach I have not tested but would probably work fine )
On Friday, September 16, 2016, Carter Schonwald carter.schonwald@gmail.com wrote:
Looks like misty has provided some debugging data upstream to ghc. I'll talk with bengamari. His current theory is that somehow the offending builds have split sections enabled. And thus are drowning the linker
On Friday, September 16, 2016, Carter Schonwald < carter.schonwald@gmail.com javascript:_e(%7B%7D,'cvml','carter.schonwald@gmail.com');> wrote:
Also if the bug is in 8.0 and someone can cook up a repro I can see about sorting out a fix and making sure it's in an 8.0 bug fix. But I can't do that until there's a small self contained small code base that has the problem
On Friday, September 16, 2016, Carter Schonwald < carter.schonwald@gmail.com> wrote:
If you can do a small single Haskell module repro then I and or other ghc dev team folk can dig into resolving it. But without a clear minimal repro that 8.0 can trigger with a small module and clear deps, hard to do.
The moment it's an issue that can be demonstrated In a self contained way, it'll be a ghc bug that gets full focus by applicable volunteers. Ghc contributors don't have the bandwidth to boil down a repro from a really large application
On Thursday, September 15, 2016, Emanuel Borsboom < notifications@github.com> wrote:
If anyone else has some time, I'd probably start trying to make a smaller reproduction by extracting the failing System.Process.Read module to a separate project and minimizing the dependencies. It's an isolated module so that should be pretty easy. If it still fails, just start taking out pieces of code until you find what triggers the panic (if it doesn't fail... well then it's more complicated I guess).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247437655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwlfNd2gRtHcsG-i2fqj47rYNnKgVks5qqaTKgaJpZM4J546w .
Ld. not dl
On Friday, September 16, 2016, Carter Schonwald carter.schonwald@gmail.com wrote:
Looks like it's because they moved to llvms tool lld, and it hard coded a limit on number of sections it'll accept. So the near term mitigation would be to disable split objects on a ghc Mac build , but the Better medium / long term is A) we all fire radars on this B) we make sure that upstream lld is fixed.
Me and other ghc devs are collaborating on IRC right now to confirm that we've isolated the problem. Affected users could also unpack a pre Sierra cli tools and put dl in their path as an alternative mitigation (this is me speculating on an approach I have not tested but would probably work fine )
On Friday, September 16, 2016, Carter Schonwald < carter.schonwald@gmail.com javascript:_e(%7B%7D,'cvml','carter.schonwald@gmail.com');> wrote:
Looks like misty has provided some debugging data upstream to ghc. I'll talk with bengamari. His current theory is that somehow the offending builds have split sections enabled. And thus are drowning the linker
On Friday, September 16, 2016, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Also if the bug is in 8.0 and someone can cook up a repro I can see about sorting out a fix and making sure it's in an 8.0 bug fix. But I can't do that until there's a small self contained small code base that has the problem
On Friday, September 16, 2016, Carter Schonwald < carter.schonwald@gmail.com> wrote:
If you can do a small single Haskell module repro then I and or other ghc dev team folk can dig into resolving it. But without a clear minimal repro that 8.0 can trigger with a small module and clear deps, hard to do.
The moment it's an issue that can be demonstrated In a self contained way, it'll be a ghc bug that gets full focus by applicable volunteers. Ghc contributors don't have the bandwidth to boil down a repro from a really large application
On Thursday, September 15, 2016, Emanuel Borsboom < notifications@github.com> wrote:
If anyone else has some time, I'd probably start trying to make a smaller reproduction by extracting the failing System.Process.Read module to a separate project and minimizing the dependencies. It's an isolated module so that should be pretty easy. If it still fails, just start taking out pieces of code until you find what triggers the panic (if it doesn't fail... well then it's more complicated I guess).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247437655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwlfNd2gRtHcsG-i2fqj47rYNnKgVks5qqaTKgaJpZM4J546w .
Is this a setting that is "baked into" the GHC binary when the bindist is built, or can it be controlled by an argument to the bindist's configure
when installing the bindist (or, alternatively, by adjusting the lib/ghc-8.0.1/settings
or other files after installation)?
The build of GHC and boot lib time thing. Default on most tier want platforms is to enable split objects for the base and boot libs so that end executable size is smaller
It's also a flag for userland module builds
On Sep 16, 2016 11:23 AM, "Emanuel Borsboom" notifications@github.com wrote:
Is this a setting that is "baked into" the GHC binary when the bindist is built, or can it be controlled by an argument to the bindist's configure when installing the bindist (or, alternatively, by adjusting the lib/ghc-8.0.1/settings or similar file after installation)?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247629521, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwhj1w5pujW__B5N57yGG7ceUfbHvks5qqrSJgaJpZM4J546w .
Ah, unfortunate. Still, we can have Stack detect that it's running on macOS >=Sierra and install GHC using alternative bindists.
I'm not sure if that will suffice. We are still investigating
On Friday, September 16, 2016, Emanuel Borsboom notifications@github.com wrote:
Ah, unfortunate. Still, we can have Stack detect that it's running on macOS >=Sierra and install GHC using alternative bindists.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commercialhaskell/stack/issues/2577#issuecomment-247691097, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwrDpt8h0YR8WHS25Ej-fnCVCvsQdks5qqvGVgaJpZM4J546w .
It's actually not a GHC bug and doesn't have to do with split objects at all. Sierra has a new limit on the "load commands size" of a shared library (that you can see mentioned in the error message) which includes the paths to its dependencies, and Stack's many dependencies and long directory paths exceed this limit.
There are possible workarounds, but I can't think of anything simple.
@rwbarton @cartazio From one of our subject matter experts:
I guess this was the hidden message behind this year's WWDC session 406 where they talked about dynamic linker internals and optimizing app startup times.
The apparent new limit of 32K enforced by 10.12 feels very conservative, but then again GHC easily blowing that limit is also a bit scary.
I think their best bet is indeed to implement the idea from the brain-storming and make sure they have only one (or very few) LC_RPATH load commands that cover the majority of the path prefix(es) and the remaining LC_LOAD_DYLIB commands build on top of that to keep things small. Having a separate LC_RPATH command for every single library is a bit insane (but I can totally sympathize with the simplicity of such a solution).
I know basically nothing about GHC, but if the directory layout is somewhat predictable and fixed (aside from a changing prefix), i.e. it's easy to infer where a library should be located relative to the executable/library loading it, then another solution could be to rely on @loader_path/ and/or @executable_path/ to avoid the lengthy and repeated prefixes.
I think they will really need to optimize the load commands they are placing in every single binary to stay below the limit. The discussed common prefix idea should be fairly simple to implement. If not directly in the compiler, then at least as a post-processing step for the binaries.
The dyld
man page in its entirety, but especially the bottom was highly recommended reading.
For post-processing macho programmatically this has been suggested as a useful resource https://github.com/Homebrew/ruby-macho for various techniques.
Apparently I'm a “subject matter expert” and I'm the one being quoted above. I'm am contributor to the above mentioned Ruby library for dissecting and modifying Mach-O files and I'm decently knowledgeable about loaders and linkers in general and on macOS in particular.
I'm also completely clueless about GHC or Haskell in general, but if that's not a problem, I'm happy to help with some of my “wisdom”. (Disclaimer: I've also got a lot of other stuff on my plate at the moment, so I can't really promise super timely responses, but I'll try.)
So it sounds like this is probably something that should be fixed in properly Cabal and GHC, but may be possible to somewhat work around in Stack by shortening the paths it uses? We already do some tricks on Windows like that because of the path length limits there by putting the STACK_ROOT in C:\sr
and using a hash instead of some of the longer subdirectory chains.
@borsboom definitely.
macOS Sierra has now shipped https://itunes.apple.com/us/app/macos-sierra/id1127487414?mt=12
The same issue is being discussed over at the GHC trac.
In the mean time, what do you guys suggest that those of us who need stack on a Mac do? Is there a way to get homebrew to install prebuilt binaries of stack?
I'd install El Capitan in a virtual machine until this is resolved.
I'd suggest following the manual download instructions for Mac OS X. Those will get you a binary that reportedly runs on Sierra and works for smaller projects. For projects with a large number of dependencies, stick with El Capitan (e.g. in a VM like @ilovezfs suggested).
Thanks @borsboom. I just updated to macOS Sierra, and wanted to confirm that the manual download binary works for me.
My existing brew install of stack failed to work, which I had expected from reading this issue, and replacing the stack binary with the manual download version got me up and running.
To test, I just built a small lts-7.0
Spock application (103 dependencies), having removed my global .stack
& project work directories first, and there are seemingly no problems with my existing brew installed ghc-8.0.1.
@borsboom Manual download binary worked for me as well, thank you for that.
I've update get.haskellstack.org and the manual installation instructions to prefer the binary download over Homebrew, and added some Sierra warnings, for now.
@borsboom tyvm :+1:
I think we'll probably just boneyard it for now.
But I NEEEED Siri! LOL FML
I did some experimentation with shortening the paths like we do on Windows, but no luck: still getting the GHC panic. I can't think of any other workarounds, so I guess we'll just have to wait until it's fixed upstream in GHC/Cabal.
OK, then I shall reveal my workaround :)
We can simply strip the useless rpaths from the El Capitan bottle using install_name_tool and ship a Sierra bottle for this.
I'm not using brew-installed stack, but binary manually-installed one, and I'm still having the problem (on a rather big project).
Correct, a working stack binary will only work on projects that don't themselves hit this same bug that occurs when building stack.
GHC 7.10.3
Xcode 8.0 GM, macOS Sierra