DanielG / ghc-mod

Happy Haskell Hacking for editors. DEPRECATED
Other
677 stars 175 forks source link

Make map-file less of a hack by using semi-canonical file paths everywhere #653

Open DanielG opened 8 years ago

DanielG commented 8 years ago

Most of the complexity of map-file seems to come from the fact that we need to splice the original file path into places where the temporary file name used to implement redirection will show up in the output.

This problem stems from the fact that right now we always canonicalizePath any file paths we handle internally. This in turn is because of how we deal with module equality in HomeModuleGraph: ModulePath is just a pair of a ModuleName and a FilePath. I belive that checking the equality of modules by comparing only the ModuleName could be a major mistake for ghc-mod since at some of the higher levels we have to deal with multiple modules having the same module name but differing paths (when dealing with multiple cabal components simultaneously).

Therefore I had to define equality on ModulePath just by equality on both the module name and the canonicalized path of the module (could probably be reduced to just equality on the cannonicalized file name but then I can't deriving the instance ;)

The path needs to be canonicalized as this ensures that two file paths are considered equal if and only if the two files are actually the same file.

Now if we makeRelative all the paths we get out of canonicalizePath we get 'portability' as in the project directry won't show up in the ghc(-mod) output which in turn means we could when using map-file just copy the project directory to a temporary directory and replace the files we need to with the redirected version.

This way we don't need to hack around with the logging output, replacing paths everywhere, since all the paths are relative and we still satisfy a relaxed variant of the requiremnet that paths are only equal if the files are the same (relaxed to only the file contents being the same, not the referenced file) which should be fine for our purposes.

This also ties in nicely with @alanz's idea of allowing fully transparent operation over a optionally networked channel. Since ghc-mod managing a project directory's file contents is a precondition for that anyways.

alanz commented 8 years ago

This assumes that for a given project everything is defined as being relative to the root of the project, as in ghc-mod root?

Can there ever be more than one cabal file in play at the same time, as in a complex project?

If so, there needs to be a clearly defined top level root

DanielG commented 8 years ago

I don't think that's a problem really, just replace 'project directory' in the above with 'super-project directory' and it should still work ;)

alanz commented 8 years ago

I agree, we just need to be clear what is being used as the root, at all times.

On Mon, Oct 12, 2015 at 2:31 PM, Daniel Gröber notifications@github.com wrote:

I don't think that's a problem really, just replace 'project directory' in the above with 'super-project directory' and it should still work ;)

— Reply to this email directly or view it on GitHub https://github.com/kazu-yamamoto/ghc-mod/issues/653#issuecomment-147382924 .

lierdakil commented 8 years ago

@DanielG, won't copying whole project directory incur a noticeable performance penalty though? Or am I misreading that? While intended use of map-file assumes only a couple files are mapped (and copied), larger projects can have several hundred of source files total.

alanz commented 8 years ago

I understand the copy to be a once off thing, then the mapped file replaces the original in the copy, when it happens. So there is still the same performance as now, once the initial copy is done

DanielG commented 8 years ago

Not the whole project directory just the skeleton, (i.e. the directories) and not even that you only need the directories leading up to the file(s) you want to redirect! So if the file is A/B/C.hs you only need to do the equivalent of mkdir -p A/B and copy C.hs there.

alanz commented 8 years ago

ok, like the autogen stuff

DanielG commented 8 years ago

yeah exactly like that. Given an appropriate setup for the search path that will override the files we want to override while keeping the paths relative.

lierdakil commented 8 years ago

Okay, sounds reasonable.

alanz commented 8 years ago

And eventually we should let map-file send a diff

DanielG commented 8 years ago

Bad news everyone :/

I just tried this out and GHC prefixes the search path item it found a module in to log messages about it.

alanz commented 8 years ago

This is where this kind of approach starts getting useful: https://github.com/haskell/haskell-ide-engine/blob/master/hie-plugin-api/Haskell/Ide/Engine/SemanticTypes.hs#L23

DanielG commented 8 years ago

Yeah I wish ghc exposed its log messages as data structures but alas it doesn't and it would be an enormous amount of work to change that AFAICT.

alanz commented 8 years ago

There is some effort starting to happen around this. See https://ghc.haskell.org/trac/ghc/ticket/8809

DanielG commented 8 years ago

Holy shit that's awesome. Adding this to the pretty printer seems like a really good idea.

alanz commented 8 years ago

We are slowly getting the IDE story in hand, from GHC all the way through

DanielG commented 8 years ago

Okay so I have another idea on how to make this less messy (but equally hacky).

1) Let's use a subdirectory in dist/ for the redirected temp files instead of /tmp this should make the paths a lot less ugly (See https://github.com/kazu-yamamoto/ghc-mod/issues/692, this is just unfashionable) 2) Don't try to rewrite the paths in the error messages we spit out, just leave them relative to project root i.e. something like dist/gm-tmp/Foo/Bar/Baz.hs 3) Before spitting out any log message make sure dist/gm-tmp/Foo/Bar/Baz.hs is replaced by a symlink to the original file. This way editors that implement file path equality right shouldn't get confused.

DanielG commented 8 years ago

Symlinks on windows might be a bit tricky but AFAIK NTFS supports them just fine but I'm not sure.

lierdakil commented 8 years ago

File path equality in itself is usually implemented via pure functions, not with IO actions. So this would be extremely brittle, I think.

DanielG commented 8 years ago

That's almost never the right thing to do though, I'd consider any IDE that doesn't do that right to have a bug. We had this problem in the Emacs frontend before and then switched to file-equal-p which takes symlinks into account.

Can you see what happens when you have a file and a symlink to that file with a different name and open them in atom?

DanielG commented 8 years ago

If you want to use pure comparison you have to do something equivalent to canonicalizePath or readlink -f on the path first btw.

lierdakil commented 8 years ago

Atom itself follows symlinks, that much I can say without looking. But with packages, this depends For example, ide-haskell uses pure functions, since both ghc-mod and Atom canionicalize paths internally. Doesn't make much sense to slow down things with IO when it's not needed...

lierdakil commented 8 years ago

In any case, even if equality is implemented with symlink resolution, it doesn't change the fact that ghc-mod will prefix paths with dist/gm-tmp/ in its output, and editor will have to do additional work to fix those. This doesn't seem good.

DanielG commented 8 years ago

My assumptions here are that there are three cases in which the paths in ghc-mod's output are relevant:

1) When users have to look at them (References to where a binder was defined in type errors, see https://github.com/kazu-yamamoto/ghc-mod/issues/692 for example) 2) Editors parsing the paths 2.1) .. to highlight stuff in the current file (like the native ghc.el) 2.2) .. to jump to a mentioned file (like compilation-mode in Emacs)

So:

By using a relatively short relative path users can skip the prefix since it's visually less noisy than the absolute path.

By using the symlink trick we take care of (2.*) since editors can identify what file a message belongs to by cannonicalizing it. It doesn't have to do any extra work apart from that (which it should already be doing anyways).