babashka / fs

File system utility library for Clojure
Eclipse Public License 1.0
174 stars 46 forks source link

Paths can have FileSystem that have different separators than the OS-specific one #111

Open onetom opened 1 year ago

onetom commented 1 year ago

I see a couple of places in the code, where OS support is hardwired for babashka.fs/file-separator.

It could be made more flexible, if we would use a filesystem specific separator, since it's possible to use an S3 filesystem (via https://github.com/awslabs/aws-java-nio-spi-for-s3 for example) under Windows, which uses slash as separator, while local file paths would still use backslash.

java.nio.file.FileSystem - just like java.io.FileSystem - does have a getSeparator method, which we could use to be file-system agnostic. The java.nio.file.Path#getFileSystem method allows obtaining the separator on a per-Path basis.

It's also possible to test this more dynamic implementation, using the JimFS library (https://github.com/google/jimfs), which can simulate both unix and win filesystems on any OS. Here is a nice tutorial on this topic: https://www.baeldung.com/jimfs-file-system-mocking

As an aside, babashka.fs/path-separator is a separate question. That probably makes sense to keep it as-is, since the OS won't change under a JVM process, while its running.

borkdude commented 1 year ago

I see a couple of places in the code, where OS support is hardwired for babashka.fs/file-separator.

Can you be more specific, or just provide a PR to fix it?

borkdude commented 1 year ago

I've read the issue again and now understand it better, but I'd like to have very specific examples / test cases of what currently fails with the "hard-coded" approach, i.e. a repro. We can deprecate babashka.fs/file-separator in favor of a new function and use that instead to make the test(s) pass.

onetom commented 1 year ago

i can prepare a repo for you in the next few hours, to demonstrate my specific use-cases with both S3 filesystem provider and Jimfs in-memory provider.

i gave a lot of thoughts how would I address this problem, but I'm not sure.

we made a tiny wrapper in our project to explore different approaches, but none of them were very satisfactory, that's why I haven't followed up on the issue yet.

borkdude commented 1 year ago

ok sorry for sounding impatient. the jimfs thing sounds like it could work in the unit tests, s3 is a bit more complicated. just a zip filesystem could also work since the file separator is always "/" there, even on Windows

onetom commented 1 year ago

Made a repro repo: https://github.com/onetom/babashka-fs-issue-111

Here is one symptom of not taking into account of the file system of a Path:

  (-> s3-path (fs/path "test.txt"))
  ; Execution error (UnsupportedOperationException)
  ;   at software.amazon.nio.spi.s3.S3Path/toFile (S3Path.java:729).
  ; S3 Objects cannot be represented in the local (default) file system

For further context see the repl.s3 namespace.

I will commit some Jimfs examples too tomorrow.

onetom commented 1 year ago

Pushed some Jimfs examples too.

Regarding the implementation, here are some observations:

  1. The path & as-path functions are calling io/file, involves the legacy java.io.File concept, which doesn't support pluggable file systems, if i understand it correctly, so that could be the reason for some of the jimfs:// & s3:// errors I've experienced before.
  2. java.nio.file.Paths class is deprecated in favor of Path/of.
  3. appending to an existing Path can be done by Path#resolve & Path#resolveSibling, so we can avoid involving io/file there too.

In light of these, here are some proposed changes.

Use Path/of:

(defn- as-path
  ^Path [path]
  (cond
    (instance? Path path) path
    (instance? URI path) (Path/of path)
    (instance? File path) (.toPath path)
    :else (Path/of path (make-array String 0))))

this hasn't broken any existing tests.

Use Path#resolve:

(defn path
  "Coerces f into a Path. Multiple-arg versions treat the first argument as
  parent and subsequent args as children relative to the parent."
  (^Path [f]
   (as-path f))
  (^Path [parent child]
   (.resolve (as-path parent) (as-path child)))
  (^Path [parent child & more]
   (reduce path (path parent child) more)))

this broke 5 tests, so I'll look into those later.

borkdude commented 1 year ago

Sounds good

borkdude commented 8 hours ago

@onetom One of the changes was merged to master (the.resolveone in fs/path). However when I change as-path to:

(defn- as-path
  ^Path [path]
  (cond
    (instance? Path path) path
    (instance? URI path) (Path/of path)
    (instance? File path) (.toPath path)
    :else (Path/of path (make-array String 0))))

I get about 70 failing tests

borkdude commented 8 hours ago

The failing tests were coming from 1.9.0 and I'm fine with deprecating that clojure version