commercialhaskell / path

Typed filepath
Other
120 stars 45 forks source link

Support `AbstractFilePath` #189

Open hasufell opened 2 years ago

hasufell commented 2 years ago

Motivation

The next filepath release will add support for a new API, which fixes subtle encoding issues and improves cross platform code and memory residence.

You can read about the state of the newly added API here: https://github.com/haskellfoundation/tech-proposals/issues/35

Release candidate with haddock is here: https://hackage.haskell.org/package/filepath-2.0.0.3/candidate

Demonstration about unsoundness of base with exotic encodings and how the new API fixes them is here: https://gist.github.com/hasufell/c600d318bdbe010a7841cc351c835f92

Migrations

Migrations can happen once the filepath, unix and Win32 packages are updated and in sync. A migration would usually involve using the new types from System.AbstractFilePath and the new API variants from unix/Win32.

When writing low-level cross platform code manually (shouldn't generally be necessary), the usual strategy is this:

#if defined(mingw32_HOST_OS) || defined(__MINGW32__)
import qualified System.AbstractFilePath.Data.ByteString.Short.Word16 as SBS
import qualified System.AbstractFilePath.Windows as PFP
#else
import qualified System.AbstractFilePath.Data.ByteString.Short as SBS
import qualified System.AbstractFilePath.Posix as PFP
#endif

crossPlatformFunction :: AbstractFilePath -> IO ()
#if defined(mingw32_HOST_OS) || defined(__MINGW32__)
crossPlatformFunction (OsString pfp@(WS ba)) = do
    -- use filepath functions for windows specific operating system strings
    let ext = PFP.takeExtension pfp
    -- operate directly on the underlying bytestring (which is a wide character bytestring, so uses Word16)
    let foo = SBS.takeWhile
    ...
#else
crossPlatformFunction (OsString pfp@(PS ba)) = do
    -- use filepath functions for posix specific operating system strings
    let ext = PFP.takeExtension pfp
    -- operate directly on the underlying bytestring (which is just Word8 bytestring)
    let foo = SBS.takeWhile
    ...
#endif

Platform specific code can be written using PosixFilePath/WindowsFilePath types.

If you have further questions, please let me know. I'm going to write a blog post outlining the affairs and more in-depth intro and migration strategies close after the release. This is a heads up.

mpilgrem commented 1 year ago

@NorfairKing, I thought I might have a stab at implementing this but before I embark on it (if I succeed) would it be welcomed?

hasufell commented 1 year ago

The blog post explaining migration is here: https://hasufell.github.io/posts/2022-06-29-fixing-haskell-filepaths.html

NorfairKing commented 1 year ago

@mpilgrem I think we have no choice but to support the newer filepath, right? Would this involve API breakage for path users? I'd say go ahead!

hasufell commented 1 year ago

@NorfairKing the old filepath API will stay, though. But this is an exciting improvement. I'm happy to review/assist with necessary changes.

One choice you ought to make is if you go full OsPath or if you create a second set of API variants.

directory, unix and Win32 already support the new API, but packages like process might not. However, there's glue code one can use to access legacy API regardless (also described in the blog post).

mpilgrem commented 1 year ago

@hasufell, on migration, can I pick your brains on something? Stack currently has types like this one:

newtype BuildCache = BuildCache
    { buildCacheTimes :: Map FilePath FileCacheInfo
      -- ^ Modification times of files.
    }
    deriving (Generic, Eq, Show, Typeable, ToJSON, FromJSON)

and the aeson package has, as an instance of FromJSON:

(FromJSONKey k, Ord k, FromJSON v) => FromJSON (Map k v)

but (a) the instances for OsString in filepath are limited to Monoid, Semigroup, Generic, Show, NFData, Eq, Ord, Lift and Type; and (b) the aseon package does not provide instances for OsString. I am not sure what is the way forward. Is there some 'recommended source' of orphan instances, somewhere?

hasufell commented 1 year ago

JSON instance is one of those few cases where maybe String is a better filepath since it's "normalized".

I mean, if you send the raw bytes from a machine with encoding CP932 to a machine with encoding UTF-8... what result are you expecting? Even worse, if you send OsPath from windows to a linux machine, it'll be utter garbage.

But if it's converted to a list of unicode codepoints, you can encode them into whatever you want and don't need to know the original encoding. It's the "unified" representation (but it's not lossless).

Whether the filepath makes sense on "the other end" is still up to interpretation.

If the aeson instance is for local use only, you could write an instance for OsPath and convert to base64, see https://github.com/haskell/aeson/issues/187

hasufell commented 1 year ago

Oh, and, given the "over the wire" scenario, you can turn the filepath back into an OsPath via encodeFS.

This way you're possibly translating between different encodings and the semantics are "visible unicode characters", so to speak. That may not always be the semantics you want (e.g. for filepath whitelists you probably want the raw bytes).