Clean up system-filepath

snoyberg commented 9 years ago

We know that the library has plenty of warts. One possibility is to make the main data type a newtype wrapper around String.

mboes commented 9 years ago

I think that would be simplest, yes. not sure how much pre-digesting filepath's the way the library currently does really helps with anything (for many common cases if anything it should probably make performance worse).

The other idea here was to make FilePath a datatype of edits. This amounts to only lazily chopping FilePath into components:

data FilePath = FilePath ByteString [Edit]

data Edit = PopBack | PopFront | PushBack Component | PushFront Component |
...

data Component = Dot | DotDot | Node ByteString | ...

This works well for the common case where one is constructing many variations of one base filepath. For that use case, you don't want to chunk up the entire filepath just to e.g. modify the extension. Conversely, when the user really is constructing the filepath piecewise, reinterpreting a ByteString continuously to swap out the extension for another is a pessimization. I'm unconvinced whether this warrants the extra weight - haven't thought this through much.

Either way would constitute a pretty substantial rewrite of the internals, mind.

chrisdone commented 9 years ago

@mboes Sounds like a "path builder", if we take the current string (text/bytestring) vs construction of strings (builders) by analogy. Most uses-cases of path handling is probably trivial, but when handling hundreds of thousands of names and doing a bunch of edits I could see the advantage. I'd personally appreciate solid benchmarks to back up any performance-driven design decisions.

One possibility is to make the main data type a newtype wrapper around String.

Right, that shouldn't be so controversial, it's more of an "update".

The main problem I have with the library at present is, as evidenced by the latest merged PR, it's not very reasonable. It's too stringy. It's like writing SQL with string-literals. Certain equalities don't hold. It has a test suite and yet I don't trust it one bit. If I were to take up Andy Gill's proof assistant stuff a library like this would be the first thing I'd attack.

chrisdone commented 9 years ago

Examples of behaviour that bugs me:

λ> directory ("/foo/bar" <> "/") == directory ("/foo/bar/" <> "/")
True

Okay, maybe (<> "/") is idempotent.

λ> "/foo/bar" <> "/" == "/foo/bar/" <> "/"
False

Nope.

λ> "/foo/bar" <> "/" == "/foo/bar/"
True

Okay, if x and y are equal then f x == f y:

λ> parent ("/foo/bar" <> "/") == parent "/foo/bar/"
False

Nope. Same for collapse:

λ> "/foo//bar/" == "/foo/" <> "/" <> "bar/"
True
λ> collapse ("/foo//bar/") == collapse ("/foo/" <> "/" <> "bar/")
False

String representation doesn't match internal behaviour:

λ> "/foo/" <> "/"
"/foo//"
λ> splitDirectories ("/foo/" <> "/")
[FilePath "/"]

More stuff that surprises me:

λ> concat (drop 1 (splitDirectories "/foo/bar/mu")) == "foo/bar/mu"
True
λ> concat (drop 1 (splitDirectories "/foo/bar/mu")) == parent "foo/bar/mu/bar"
False
λ> let root = concat . take 1 . splitDirectories
λ> root (parent "foo/bar/mu/bar") == root "foo/bar/mu/bar"
False

Generally icky.

snoyberg commented 9 years ago

I've spoken about this with @dysinger and @chrisdone over the past 24 hours. I'm going to try to get all of the ideas out there for discussion:

We almost certainly want to just go with newtype FilePath = FilePath String for simplicity, and possibly use newtype FilePath = FilePath ByteString on POSIX systems for efficiency (but at a lower priority)
In system-fileio, it would be nice to generalize the functions to MonadIO instead of just IO
The split between two packages doesn't make much sense. Having a single package, and ideally a single Filesystem module, is probably a good direction to head in.

Also, since it came up with Tim, I want to raise a fundamental question: is the library still a good idea. Let's good pros and cons:

Pros:

Fixes idiosyncrasies with GHC's FilePath implementation. Since GHC 7.4, this is no longer necessary
Type safety to prevent accidental swapping of String and FilePath
Nicer/more sane API can filepath/directory provide
Possibly more efficient (e.g., using ByteString)

Cons:

API duplication with standard libraries
Lots of annoying conversion between data types

So I'd like to raise the point for discussion: should we consider in the next release simply re-export System.IO.FilePath and essentially deprecating this library?

mboes commented 9 years ago

To me it comes down to whether it's useful to have FilePath be a distinct type from String. If that is dropped, some amount of merging with filepath/directory is conceivable. On that from, perhaps it's useful to make explicit what's nicer about this API?

However, I actually think this is one place where FilePath and String should be distinct. Not so much because of the type safety, which in any case for string literals is mitigated by the existance of an IsString instance. But because there are reasonable equations that can be defined on FilePath, which the Eq instance should be quotiented over. I.e. we should have "blah/./foo// == blah//foo/. The fact that the Eq instance on String doesn't work that way is a source of actual bugs in real programs.

snoyberg commented 9 years ago

That's a pretty strong reason in favor of keeping a separate datatype. I was already leaning towards keeping a separate datatype myself.

DanBurton commented 9 years ago

Although "foo/../foo/bar" and "foo/bar" lead to the same location, they are arguably different "paths" to get to that location. I'm not sure if it is ever useful to actually think of paths this way, but it's just a semantic point to consider.

On Tuesday, April 14, 2015, Michael Snoyman notifications@github.com wrote:

That's a pretty strong reason in favor of keeping a separate datatype. I was already leaning towards keeping a separate datatype myself.

— Reply to this email directly or view it on GitHub https://github.com/fpco/libraries/issues/31#issuecomment-92776235.

-- Dan Burton

chrisdone commented 9 years ago

The two in favour I see are:

Type safety to prevent accidental swapping of String and FilePath

Possibly more efficient

So I marginally favor a newtype to prevent mixing up.

But having operations pretend that representationally different things are equal seems like asking for trouble, it leads to weirdness that I wrote above… I'd prefer that everything is normalized into a consistent internal format at parsing stage or ./ and .. etc. be simply rejected as paths that programs don't need to deal with except rarely. E.g. "../foo" is something a user might type at a prompt and should be resolved there, but it seems for trouble to retain that representation inside your programs.

mboes commented 9 years ago

@danburton Yep, quite right. Which is why I equated foo/./bar and foo/bar, and not foo/../bar. Symlink normalization, à la readlink is something that can only be done in IO. Hence not something we want to include in the Eq instance.

mboes commented 9 years ago

@chrisdone I'm not sure what operations you're referring to. Earlier conversations from way back when or the equality I wrote today? There are two ways of implementing such an equality:

represent paths as they are typed but normalize during equality.
normalize paths at creation time and make Eq be structural.

With a newtype, both are equivalent, but the latter sounds much better to me. I think that's what you're saying. Essentially what we're talking about is internalizing filepath's normalise, but not any equivalent of readlink, into the equality.

For me, this is the killer feature of a newtype: let values be members of arbitrary equivalence classes. Personally, given the other costs of having a FilePath be its own type, rather than a type synonym, if it wasn't for the above then I wouldn't bother.

chrisdone commented 9 years ago

Yeah, the latter is what I'd prefer.

snoyberg commented 9 years ago

We all still want a distinct data type for file paths, so let's move ahead in that direction. @dysinger This item is now active for work. Ping me with any questions you have about how to get started.

dysinger commented 9 years ago

I asked @chrisdone if maybe we should start with the existing filepath package & add the newtype & refactor. If the goal is to have a drop in replacement for filepath with an improved type for FilePath then this makes sense to me.

#ifdef FILEPATH_REGULAR
type FilePath = String
unFilePath = id
#elseif
newtype FilePath = FilePath { unFilePath :: String }
#endif

chrisdone commented 9 years ago

Technically that would allow taking the filepath package and parametrizing it over the type (a good use-case for ML modules) without changing its implementation. Whether we want that is another question, but it's on the table as a nice way to avoid code duplication and re-using tests and such, though I'm not sure what Neil would think about it.

dysinger commented 9 years ago

https://github.com/dysinger/filepath/tree/newtype https://github.com/dysinger/filepath/commit/e0781bc51aefd7e505d618b9fbabced056a24f5e

I made this work (for discussion). The tests pass still.

dysinger commented 9 years ago

I'm still happy to do it the other way. I'm not trying to divert the plan. Happy to do either.

snoyberg commented 9 years ago

I'd like to point out another goal: backwards compatibility with the current API. We (and others) have lots of code using that API already, and I don't want to unnecessarily break it. I'm not opposed to multiple APIs, but we should keep something that supports current code with minimal changes.

dysinger commented 9 years ago

OK. I'll fix system-filepath ASAP

dysinger commented 9 years ago

I spent too much time on this. After cleaning up the old modules, consolidating & fixing all warnings, I started out trying to replace the old data type with newtype. This turned out to be a ball-of-yarn re-write. I didn't start out wanting to re-write everything. I stopped & tried a different track. This time I used a newtype around string & implementing functions to transform to/from the old data type. I converted some of the internal API to work on the newtype. This didn't turn out as well as I liked either.

I ended up pushing the old code pretty much intact into Filesystem.Path.Internal & providing a new Filesystem.Path module with a newtype FilePath around ByteString. I only implemented the currentOS default functions as a facade to the old code. The old tests can stay for now & still all pass. There is a bit of expense converting to/from bytestring/text/string in the short term.

dysinger commented 9 years ago

Next I need to audit where we are using this api & see if I can migrate some code in a branch on those projects. I'll need some feedback also.

dysinger commented 9 years ago

https://github.com/fpco/haskell-filesystem/compare/newtype%2Bfacade

dysinger commented 9 years ago

@chrisdone @mboes I'm now working through replacing the functions from system-filepath with filepath versions. I'm trying to keep the tests as a way to tell how bad I'm going to break code that needs this library.

I've run into inconsistent behavior which I've made note of in the code. For example: https://github.com/fpco/haskell-filesystem/blob/e0f2a807880256f8a9b7f9dc0cbb691d16637379/system-filepath/lib/Filesystem/Path.hs#L220

Should I work hard to reproduce everything system-filepath clients expected to see? Or should we just reset & say that, aside from a pre-normalized newtype FilePath, filepath behavior is what we want?

mboes commented 9 years ago

I think Chris demonstrated earlier in this ticket that the current equational theory (set of laws) that this package has is pretty broken and unintuitive. Therefore, I'd be up for either abandoning the API and creating a similar one with a different name but with a well behaved equational theory, or changing the current one in place (hence breaking back compat, which I personally think is fair here).

snoyberg commented 9 years ago

I'm OK with minor changes in functionality for corner cases where the new behavior is more intuitive, as Tim has demonstrated so far. I don't want to break the API for no reason.

On Thu, May 7, 2015 at 8:29 PM Mathieu Boespflug notifications@github.com wrote:

I think Chris demonstrated earlier in this ticket that the current equational theory (set of laws) that this package has is pretty broken and unintuitive. Therefore, I'd be up for either abandoning the API and creating a similar one with a different name but with a well behaved equational theory, or changing the current one in place (hence breaking back compat, which I personally think is fair here).

— Reply to this email directly or view it on GitHub https://github.com/fpco/libraries/issues/31#issuecomment-99946853.

chrisdone commented 9 years ago

I think the plan was just to keep the existing library behaviour in a backwards-compatible way but change the internal representation to just newtype Filepath = FilePath String and then later possibly optimize it.

snoyberg commented 9 years ago

I just touched base on this with @dysinger. Here's my assessment based on what he's shown me:

The internals and current behavior of system-filepath are far wonkier than we expected originally (or at least I expected)
Many functions behave differently than filepath
Other functions have no directly analogue to filepath

It seems to me that continuing with the current course is not worth the effort: none of us love the system-filepath semantics, and it's going to take significant effort. Instead, I'd like to propose a different course: a compatibility shim. Idea would be:

Release a new major version of this library
Any functions that exist in filepath are exported by the new version of this library, with naming/argument order tweaked to meet the current API
We ignore the current behavior completely, and take filepath's behavior
Other functions get dropped

And if we go this route, I'm going to push one step farther: I'd like to backtrack on making this a newtype wrapper, and instead simply reexport System.IO.FilePath. My motivation:

There's already a cognitive overhead that we've discussed here of two different datatypes
If we just reexport the type FilePath = String, we reduce our maintenance burden in the future by being able to deprecate these packages
It seems like one of the main use cases pushing this- having more well typed file paths via type parameters- isn't going to be met by this library anyway

@mboes As I see it, the main thing we lose is the normalization goals you had. But given how much effort is going on here, I don't think it's worth continuing for that goal alone. Instead, continuing with Chris's alternative library with type parameters may be the better objective.

If anyone opposes this route, speak up now. Otherwise: Tim, you have a green light to do this the easy way.

mboes commented 9 years ago

Let me make sure I understand the proposal: effectively, deprecate entirely system-filepath as we know it today, because the next version will just be a reexport of filepath, except that filepath's functions are aliased to the current names?

In that case, why not just let system-filepath be? Declare it deprecated and call it a day?

system-filepath was meant for those users that don't want to be bitten by the surprising fact that "blah//foo" isn't the same path as "blah/foo", yet are not interested in the cognitive overhead of a parameterized newtype or GADT and added verbosity in the type sigs. Maybe that set doesn't include FPCO, but in that case we should perhaps pass on maintainership to whoever else does fit in that category.

snoyberg commented 9 years ago

system-filepath was meant for those users that don't want to be bitten by the surprising fact that "blah//foo" isn't the same path as "blah/foo"

Except that wasn't the intention of the library at any point in time, it's why you liked it ;). The library was intended to:

Work around bugs in GHC <=7.2 regarding character encoding
Provide more type safety around file paths

What I'm proposing here is an easy migration path forward so that people aren't using a buggy library (see the issue tracker) but we're not stuck maintaining some arcane behavior. The things we lose with my proposal are:

type safety
normalization

Given that the first bullet isn't fully addressed by this library (since Chris wants type parameters), it's really about the second. We could maintain normalization if desired by keeping a newtype wrapper and otherwise rewriting the library. I guess I've just never been convinced that that's a good enough reason to have a completely separate datatype and set of libraries.

chrisdone commented 9 years ago

I think I'm onboard with this if I understand properly.

If we simply deprecate system-filepath then people's path forward is to go and rewrite all their codebase with the filepath package.

So essentially this proposal would end up with system-filepath as a set of functions with different names and arg orders to filepath, but otherwise can be considered compatible with filepath and written in terms of filepath, and then we would have system-fileio functions work on System.IO.FilePath too. This would allow our and others' existing codebases to migrate without having to rewrite a lot, it would leave you with the task of confirming test suites and that codebases still work. This is based on the assumption that just re-exporting wrappers around filepath will be less buggy and remove our maintenance overhead of this package.

snoyberg commented 9 years ago

Exactly

mboes commented 9 years ago

I find type safety alone a minor benefit - I'm not one to newtype the wazoo out of integer values, for instance. I like having a separate type pretty much only if it helps me manage value identity in a way that makes sense (or helps me enforce other invariants).

That's why as a user, my pragmatic choice today is filepath (no extra type safety that I don't care much for, though poor value identities), whose API I like better than system-filepath anyways. The optimal for me would be filepath's API + non-stringly typing, at least for new code. But that's more upheaving of system-filepath's interface than is reasonable.

I like the idea of implementing system-filepath in terms of filepath while still retaining the same facade, under the assumption that managing the possibility of silent breakage in legacy code due to the slightly different behaviour is less onerous than porting it to a new package.

AFAICS implementing system-filepath in terms of filepath while keeping the newtype wrapping is practically the same amount of work as doing so without the newtype. It has the benefit of being more in the spirit of the current library, while also breaking less behavioural compatibility (legacy code that assumes blah//foo == blah/foo can continue to make that assumption).

dysinger commented 9 years ago

I've pushed another branch where I roughed this out.

dysinger commented 9 years ago

I need to double-check / alias / fix:

... with naming/argument order tweaked to meet the current API

after lunch

dysinger commented 9 years ago

I'm still not happy with this. I feel defeated. I did exactly what @snoyberg said to do above (in a new branch). It's going to break everything. I even bumped into usage of the code in classy-prelude.

snoyberg commented 9 years ago

Chris and I discussed this. We're just going to deprecate these packages and stop using them. In the future, we may introduce a new package with some nicities (such as MonadBaseControl/MonadIO support, or abstracting over different representations), but that's going to happen now.

fpco / libraries

Clean up system-filepath #31