Open hasufell opened 2 years ago
@NorfairKing, I thought I might have a stab at implementing this but before I embark on it (if I succeed) would it be welcomed?
The blog post explaining migration is here: https://hasufell.github.io/posts/2022-06-29-fixing-haskell-filepaths.html
@mpilgrem I think we have no choice but to support the newer filepath
, right?
Would this involve API breakage for path
users?
I'd say go ahead!
@NorfairKing the old filepath API will stay, though. But this is an exciting improvement. I'm happy to review/assist with necessary changes.
One choice you ought to make is if you go full OsPath
or if you create a second set of API variants.
directory
, unix
and Win32
already support the new API, but packages like process
might not. However, there's glue code one can use to access legacy API regardless (also described in the blog post).
@hasufell, on migration, can I pick your brains on something? Stack currently has types like this one:
newtype BuildCache = BuildCache
{ buildCacheTimes :: Map FilePath FileCacheInfo
-- ^ Modification times of files.
}
deriving (Generic, Eq, Show, Typeable, ToJSON, FromJSON)
and the aeson
package has, as an instance of FromJSON
:
(FromJSONKey k, Ord k, FromJSON v) => FromJSON (Map k v)
but (a) the instances for OsString
in filepath
are limited to Monoid
, Semigroup
, Generic
, Show
, NFData
, Eq
, Ord
, Lift
and Type
; and (b) the aseon
package does not provide instances for OsString
. I am not sure what is the way forward. Is there some 'recommended source' of orphan instances, somewhere?
JSON instance is one of those few cases where maybe String
is a better filepath since it's "normalized".
I mean, if you send the raw bytes from a machine with encoding CP932 to a machine with encoding UTF-8... what result are you expecting? Even worse, if you send OsPath
from windows to a linux machine, it'll be utter garbage.
But if it's converted to a list of unicode codepoints, you can encode them into whatever you want and don't need to know the original encoding. It's the "unified" representation (but it's not lossless).
Whether the filepath makes sense on "the other end" is still up to interpretation.
If the aeson instance is for local use only, you could write an instance for OsPath and convert to base64, see https://github.com/haskell/aeson/issues/187
Oh, and, given the "over the wire" scenario, you can turn the filepath back into an OsPath via encodeFS.
This way you're possibly translating between different encodings and the semantics are "visible unicode characters", so to speak. That may not always be the semantics you want (e.g. for filepath whitelists you probably want the raw bytes).
Motivation
The next filepath release will add support for a new API, which fixes subtle encoding issues and improves cross platform code and memory residence.
You can read about the state of the newly added API here: https://github.com/haskellfoundation/tech-proposals/issues/35
Release candidate with haddock is here: https://hackage.haskell.org/package/filepath-2.0.0.3/candidate
Demonstration about unsoundness of base with exotic encodings and how the new API fixes them is here: https://gist.github.com/hasufell/c600d318bdbe010a7841cc351c835f92
Migrations
Migrations can happen once the
filepath
,unix
andWin32
packages are updated and in sync. A migration would usually involve using the new types from System.AbstractFilePath and the new API variants from unix/Win32.When writing low-level cross platform code manually (shouldn't generally be necessary), the usual strategy is this:
Platform specific code can be written using
PosixFilePath
/WindowsFilePath
types.If you have further questions, please let me know. I'm going to write a blog post outlining the affairs and more in-depth intro and migration strategies close after the release. This is a heads up.