commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
3.98k stars 845 forks source link

Remove the need for `hi-file-parser` #5134

Open bgamari opened 4 years ago

bgamari commented 4 years ago

After an extended debugging session with @simonmichael it came to my attention that Stack directly parses GHC's interface files via the hi-file-parser library. This is an unfortunate state of affairs. Interface files are interface to GHC and consequently come with no particular stability guarantees.

To make everyone's life easier, it would be better if we can find another way to expose the information that you need from interface files. Can you describe what information Stack needs that isn't currently available from other means? Also note that we have an open GHC proposal, the extended dependency information proposal, which may be helpful. We would love to have feedback on this.

I have opened GHC #17620 to track the GHC side of this.

simonmichael commented 4 years ago

The symptom that I was seeing is this: when building with a GHC 8.10 alpha, eg using a stack file like this, you see a lot of "Failed to decode" warnings, one per .hi file. Eg:

~/src/hledger$ stack --stack-yaml=stack-ghc8.10.yaml build --fast hledger-lib
Stack has not been tested with GHC versions above 8.6, and using 8.10.0.20191210, this may fail
Stack has not been tested with Cabal versions above 2.4, but version 3.1.0.0 was found, this may fail

Warning: Failed to decode module interface: /Users/simon/src/PLAINTEXTACCOUNTING/hledger/hledger-lib/.stack-work/dist/x86_64-osx/Cabal-3.1.0.0/build/Text/Megaparsec/Custom.hi Decoding failure:
         Invalid magic: e49ceb0f

Warning: Failed to decode module interface: /Users/simon/src/PLAINTEXTACCOUNTING/hledger/hledger-lib/.stack-work/dist/x86_64-osx/Cabal-3.1.0.0/build/Hledger.hi Decoding failure:
         Invalid magic: e49ceb0f
...
Warning: Failed to decode module interface: /Users/simon/src/PLAINTEXTACCOUNTING/hledger/hledger-lib/.stack-work/dist/x86_64-osx/Cabal-3.1.0.0/build/Text/Tabular/AsciiWide.hi Decoding failure:
         Invalid magic: e49ceb0f
WARNING: Ignoring hledger-lib's bounds on base (>=4.9 && <4.14); using base-4.14.0.0.
Reason: allow-newer enabled.
~/src/hledger$

Observed on mac catalina and on ubuntu 18.04.3, using stack 2.1.3. I wrongly assumed these were coming from the new element in the toolchain (GHC), so I went and bothered #ghc.

qrilka commented 4 years ago

@haitlahcen probably you could add your 5 cents here as the author of #4545 ?

hussein-aitlahcen commented 4 years ago

@qrilka

The problem was that Stack was trying to extract modules dependencies from the dumped .txt hi file (which was obviously not a machine readable format, causing various issues). We reversed the interface format (not documented by GHC...) to directly read the binary file.

I believe that the proposal @bgamari linked is pretty much what was missing from GHC side to avoid doing what we have done here. Thanks for having noticed the link. I guess that we will be able to extract everything for the generated format as it seems to contain all the informations we need (@snoyberg confirmation would be great here).

Finally, removing the dependency to hi-file-parser is (probably) just a matter of updating this function to directly extract the dependencies from the GHC's generated machine readable file.

bgamari commented 4 years ago

If all you need is intra-package module dependencies then it's also possible that GHC's -M mode will suffice. This produces Makefile-style output which encodes the dependency graph. The extended dependency proposal is intended to provide a more build-system agnostic format with information that the -M output currently lacks (e.g. inter-module dependencies, required language extensions, etc.)

@hussein-aitlahcen, could you describe precisely which information you extract from the interface file? I see mentions of dependency file names in the implementation but it's hard to say precisely what kind of dependencies we are talking about here (e.g. native CPP dependencies, Haskell modules within a package, inter-package Haskell dependencies, etc.).

hussein-aitlahcen commented 4 years ago

@bgamari I believe it is intra-package module deps with native TH/CPP dependencies, see the previous version of the parsing where we extract addDependentFile and the module dependencies. I've an example of the dumped hi file we were parsing here given this kind of source.

bgamari commented 4 years ago

Alright, it would be appreciated if you could engage on the GHC proposal, raising these points so we have a record of precisely what we are designing towards.

DestyNova commented 4 years ago

Is there a workaround for this? I'm getting errors like this when trying build Liquid Haskell:

Warning: Failed to decode module interface:
         /home/omf/code/haskell/liquid/liquidhaskell/.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0/build/Language/Haskell/Liquid/UX/Tidy.hi Decoding failure:
         Invalid magic: e49ceb0f

Warning: Failed to decode module interface:
         /home/omf/code/haskell/liquid/liquidhaskell/.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0/build/Language/Haskell/Liquid/WiredIn.hi Decoding failure:
         Invalid magic: e49ceb0f

Warning: Failed to decode module interface: /home/omf/code/haskell/liquid/liquidhaskell/.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0/build/LiquidHaskell.hi
         Decoding failure: Invalid magic: e49ceb0f

--  While building package liquidhaskell-0.8.6.0 using:
      /home/omf/.stack/setup-exe-cache/x86_64-linux-tinfo6/Cabal-simple_mPHDZzAJ_3.2.0.0_ghc-8.10.1 --builddir=.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0 build lib:liquidhaskell exe:liquid --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1

I tried specifying a different resolver but it wasn't happy (I'm not sure which one to pick).

snoyberg commented 4 years ago

There is no workaround right now. The best solution is to improve hi-file-parser to support the new binary format.

hsyl20 commented 4 years ago

The interface magic change was a mistake (see https://gitlab.haskell.org/ghc/ghc/issues/18180) but hi-file-parser needs to support S/ULEB128 decoding anyway.

AndreasPK commented 4 years ago

I was pointed to this ticket recently. To summarize the current state of affairs from the GHC side:

Going forward the plan for GHC itself (starting with 8.10.2) is to:

As a workaround for 8.10.2 hi-file-parser will need to:

I regret having broken this for stack. But ghc can't reasonably guarantee that the iface format won't change again in the future. There has been discussion about trying to apply compression to them as well as encoding certain values as sub-byte sequences of bits. If any of those turn out to be beneficial the format might change again radically in 8.14.

bgamari commented 4 years ago

I would reiterate that it would be helpful if stack could comment on the extended dependency generation proposal so we can confirm that it is indeed sufficient to solve the underlying issue here. It would be great to move stack off of hi-file-parser.

snoyberg commented 4 years ago

One of the reasons we have hi-file-parser is because processing the text based hi-dump files caused massive slowdown both from GHC and Stack. I don't know if that problem would reemerge with JSON. We would also still need hi-file-parser for earlier versions of GHC. Finally, I don't know how Cabal would expose access to these files.

Most likely we'd be able to move over to parsing the new format, but I'm not certain.

snoyberg commented 4 years ago

@AndreasPK no concerns from my side. Changes in the hi binary file format is absolutely expected, we just need to update hi-file-parser to handle it. My only request would be if you know the format is changing and you remember to give a ping on that repo. But as you can see, we'll notice pretty quickly due to the warnings.

We had these problems before with the hi-dump format too, and moved to explicit and scary-looking warnings just to make sure it didn't silently get ignored.

That library supports different parsing for different GHC versions, so updating as you mention ideally won't be too difficult.

snoyberg commented 4 years ago

Looking at the code on this just a bit: it doesn't seem like it's really possible to continue using the hi-file-parser approach based on the changes you're planning. Ideally there would be a reliable way to tell which version of the format is being parsed immediately. However, if the magic number is changing back, we have no way of telling whether it's the new variable length encoding or fixed length encoding in the future.

@haitlahcen do you have any thoughts on all of this?

For context of the original change to use hi-file-parser, see #4545.

minad commented 4 years ago

What is the status on this? I just upgraded to stackage nightly with GHC 8.10 and seeing a lot of "invalid magic" warnings. Why has stackage moved to 8.10 if it cannot cope with the 8.10 hi files yet - or maybe this is not a real problem besides printing a warning?

simonmichael commented 4 years ago

It's harmless, for now at least; but annoying/alarming for users. From snoyberg's last comment it sounds like the version of hi files format needs to be managed and exposed a bit more clearly on the GHC side.

tfausak commented 4 years ago

Is there a way to ignore just this warning? Using GHC 8.10 with Stack outputs a lot of noise to my terminal. I want all of the warnings from Stack and GHC in general, but I'd like to disable this one because it's not actionable for me and everything seems to work regardless.

etorreborre commented 4 years ago

Same concern here :-)

snoyberg commented 4 years ago

I think I have a good solution to this warning issue. We want the warnings to appear to Stack devs, but no one else. What I think I'm going to do is set up a new config flag for "developer-mode" or something like that. The default for the official binaries will be to set it to off. The default when you build from Git will be for it to be on. When the flag is on, warnings like this will be set to WARN level. When the flag is off, these messages will be DEBUG level. That will allow Stack developers to hear about these issues, and users to ignore them.

@bgamari I don't know what other comments I can provide to you on the GHC proposal. From what I can see, it's not addressing the concerns that originally led to hi-file-parser to be written. I'd also rather not have this separate library, but it doesn't seem like we have any choice given the lack of a consistent, multi-GHC-version interface for determining TH dependent files.

tfausak commented 4 years ago

Sounds good to me @snoyberg! I'm not a Stack developer, but the code in #5325 looks good to me.

tonyday567 commented 4 years ago

In the meantime, for those going blind with ghc-8.10 development:

stack build --test --file-watch 2>&1 | sed '/^Warning:/,/Invalid magic: e49ceb0f$/d'
Dessix commented 4 years ago

If you like a bit of color in your terminal, Tony's command can be done this way: stack build --test --file-watch --color=always 2>&1 | sed -r '/^(\x1b\[[0-9;]*m)?Warning:(\x1b\[[0-9;]*m)? Failed to decode/,/e49ceb0f.*?$/d' 2>&1

ProofOfKeags commented 3 years ago

@snoyberg With a few months of reflection, is this still an approach you consider good? I noticed that this is still an issue in stack 2.5.1. If the approach is one you'd still feel comfortable with, it sounds like it is a reasonably low touch change, that I could take a stab at. It sounds simple from your description, but this will be my first foray into the stack codebase, so I'm not sure what I'm in for exactly. I'm guessing the only reason it hasn't been done is due to not being prioritized given that 8.10.x isn't in the stackage LTS set.

snoyberg commented 3 years ago

Sorry, a few different things have come up in this thread. Which approach are you talking about here?

Overall, I'd support any implementation for this that lets Stack continue to parse the .hi files. If that comes from an upstream GHC proposal, that's great, but I don't think the one mentioned above will help.

ProofOfKeags commented 3 years ago

For specificity, this one: https://github.com/commercialhaskell/stack/issues/5134#issuecomment-647091312

But your response just now indicates that this falls under the generalization you're amenable to.

stackptr commented 3 years ago

Is this issue still valid? I encountered this today when building a project on ghc 8.10 but upon upgrading stack (from 2.3.1 to 2.5.1) it seems like these warnings are no longer appearing.

tfausak commented 3 years ago

The change made in #5325 hides this warning from typical users. The problem still exists behind the scenes. You should be able to set stack-developer-mode: true in your stack.yaml to see the warnings.

tonyday567 commented 3 years ago

Can be closed from a users perspective.

andreasabel commented 3 years ago

I didn't observe the warnings with stack 2.5.1, but they appeared after a stack upgrade that got me to 2.5.1.1:

$ stack --version
Version 2.5.1.1 x86_64 hpack-0.33.0
gergoerdi commented 3 years ago

This now blocks Stack from being used with GHC 9.0.1: #5486 #5445

konn commented 3 years ago

I'd never encountered this issue in GHC <9, but today I encountered this with GHC 8.10.4 on macOS, in which stack silently dies and --verbose flag reveals it. Is there any workaround, or plan for the fix in the near future?

snoyberg commented 3 years ago

The short term issue for GHC 9.0 should be addressed by #5508. I don't believe we have a long-term solution for getting the dirty file information out of GHC effectively yet, but with this workaround in place, it simply means that dirty file checking will not work correctly with addDependentFile (the current status quo).

bgamari commented 3 years ago

@bgamari I don't know what other comments I can provide to you on the GHC proposal. From what I can see, it's not addressing the concerns that originally led to hi-file-parser to be written. I'd also rather not have this separate library, but it doesn't seem like we have any choice given the lack of a consistent, multi-GHC-version interface for determining TH dependent files.

@snoyberg, I'm afraid I don't follow. The proposal does provide information about TH dependent files (via the dependencies key). Can you clarify?

snoyberg commented 3 years ago

I think we're saying the same thing. Stack has a need here: learn about TH dependent files. That information is available in .hi files. Stack needs to get that information. GHC currently doesn't provide a method for doing that directly. Previously, we use .hi-dump files, but that caused issues because in some cases those files were massive. So someone contributed a binary parser, which is now broken due to changes in GHC's file format. You mentioned a GHC proposal, which I think we now both agree is irrelevant to the issue at hand. We're left with the original challenge: how do we efficiently determine dependent files inside of Stack?

For now, the "solution" is to simply fail on newer GHC versions, and hope that someone cares enough about this to add additional parsing capabilities to hi-file-parser.

bgamari commented 3 years ago

You mentioned a GHC proposal, which I think we now both agree is irrelevant to the issue at hand. We're left with the original challenge: how do we efficiently determine dependent files inside of Stack?

I'm trying to suggest that the proposal is relevant: it provides precisely the information that you are currently parsing out of the .hi file.

snoyberg commented 3 years ago

You're right. I completely misread your previous line. Apologies. Yes, this should address the needs hi-file-parser is filling going forward. We'd still need hi-file-parser for existing GHC versions, but that shouldn't be an issue.

hsyl20 commented 3 years ago

I've fixed hi-file-parser (cf https://github.com/commercialhaskell/hi-file-parser/pull/2) so that it supports 8.10 and 9.0

snoyberg commented 3 years ago

Thank you @hsyl20! I've merged and released your PR, and will open up a PR against Stack shortly to use it.

fosskers commented 3 years ago

Seems to work like a charm, I no longer see Invalid magic messages! (stack 2.6.0 built from master via stack upgrade --git).

ProofOfKeags commented 3 years ago

Seems like this can be closed now.

AndreasPK commented 3 years ago

Seems like this can be closed now.

Sadly the underlying issue remains so that seem premature.

mpilgrem commented 2 years ago

@hsyl20, would you be able to do what you did for GHC 9.0, but for GHC 9.4? See https://github.com/commercialhaskell/hi-file-parser/issues/5. I think the .hi format may have changed.

hsyl20 commented 2 years ago

@mpilgrem Sure. I've responded in the ticket.