apple / pkl

A configuration as code language with rich validation and tooling.
https://pkl-lang.org
Apache License 2.0
10.13k stars 270 forks source link

Support cloning the project on Windows #99

Closed translatenix closed 4 months ago

translatenix commented 7 months ago

This is the first step in #20 and critical to unlock outside contributions.

translatenix commented 5 months ago

Is there a timeline to fix this? Developing on WSL is really painful. Gradle and esp. IntelliJ are extremely slow (30s pause is normal), and important IntelliJ features such as showing the JDK source code don't work. I think fixing this might require changing the package cache file layout (no : character in filenames).

translatenix commented 5 months ago

@bioball Windows support is very important to me. I'd be willing to work on this first step, but I'd need some guidance wrt. changing the cache file layout.

bioball commented 5 months ago

How about we precent-encode them?

According to their docs, these characters are reserved:

Forward slash is also reserved on macOS and Linux, so we don't need to worry about encoding them (Pkl filenames cannot contain forward slashes, and URI forward slashes are equated to path separators).

For Windows only, it probably makes sense to percent-encode all other characters. Or, are there other encodings that people use?

Here's some literature:

https://stackoverflow.com/questions/1184176/how-can-i-safely-encode-a-string-in-java-to-use-as-a-filename https://stackoverflow.com/questions/1077935/will-urlencode-fix-this-problem-with-illegal-characters-in-file-names-c

I think we'd want to take the same approach for writing output files. For instance, what will be written when you pkl eval -m . foo.pkl here?

// foo.pkl
output {
  files {
    ["foo:bar.txt"] { text = "foo:bar" }
  }
}
translatenix commented 5 months ago

Don't we want the exact same layout across operating systems? Seems desirable for portability/tooling/etc. Gradle's dependency cache used to be non-portable, and it was causing a lot of pain. https://github.com/gradle/gradle/issues/1338

bioball commented 5 months ago

I can see wanting that, especially if you are using the cache dir as a way to vendor dependencies in a repo.

In that case, it might make more sense to think of this as a separate problem from writing file paths with -o or -m. In those cases, it'd be surprising if -o "foo:bar.txt" somehow mangled that output file on linux/macOS.

In that case, maybe it makes the most sense to always percent encode those characters when writing to the cache dir.

translatenix commented 5 months ago

Is mangling required for -o/-m? Can’t -o "foo:bar.txt" just fail on Windows? Another option would be to make it fail everywhere (“non-portable output path”). Anyway, I agree that this is a separate concern.

bioball commented 5 months ago

What does Windows do right now if you try to create a filename with a reserved character? E.g. if you do echo "hello" > foo:bar.txt?

Note: for percent-encoding, we will also need to percent-encode the literal %.

translatenix commented 5 months ago

What does Windows do right now if you try to create a filename with a reserved character?

PowerShell:

echo "hello" > foo:bar.txt
Out-File: Cannot find drive. A drive with the name 'foo' does not exist.
echo "hello" > $home\foo:bar.txt   # creates a file named "foo", but it's empty

Java 21:

jshell> new File("foo:bar.txt").createNewFile()   // creates a file named "foo"
jshell> Files.createFile(Path.of("foo:bar.txt"))
|  Exception java.nio.file.InvalidPathException: Illegal char <:> at index 3: foo:bar.txt

File explorer: Can't create file with invalid path

translatenix commented 5 months ago

I assume the package URL -> file path conversion should be unique, to rule out name collisions. Should it also be reversible?

bioball commented 5 months ago

Yeah, ideally reversible, which is why percent-encoding seems like a good choice here.

mitchcapper commented 5 months ago

echo "hello" > $home\foo:bar.txt jshell> new File("foo:bar.txt").createNewFile() // creates a file named "foo"

To note, the colon here is only kind of an invalid character. Really what you are doing is writing to alternate data streams. This is why the file can appear but is 'empty'. For details: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3

This is just one character but will be slightly different than other path characters that are truly invalid (in terms of if/what errors thrown).

Also why from a command line this works:

> echo "hey" > "somefile.txt:alt_stream" && cat somefile.txt && echo "..." && cat "somefile.txt:alt_stream"
"..."
"hey"
bioball commented 4 months ago

Hm... this impacts pkldoc too. This means that generated pkldoc websites should also use percent encoding, otherwise we can't write directories or filenames onto Windows.

@mitchcapper good to know. I don't think that changes the fact that we should encode these characters.

mitchcapper commented 4 months ago

I don't think that changes the fact that we should encode these characters.

Right, should certainly not be on the whitelist as not valid as path only some apis for the alt stream access.

bioball commented 4 months ago

WRT encoding:

Using percent-encoding is pretty ugly, because you get doubly-encoded URL paths.

Kotlin's encoding seems nicer. A nice benefit here is that the file paths match the URL paths. However, it works for them because square bracket chars aren't allowed as identifier names. In Pkl, the only character not allowed in an identifier is the backtick literal.

Encoding file path URL path
Percent localhost%3A0 localhost%253A0
Dokka localhost[58]0 localhost[58]0
bioball commented 4 months ago

Maybe we can copy Dokka. We can simply represent verbatim [ as [[. Multiple [ literals would just be double the amount of [.

I don't think we need a special way to represent ]; it only has meaning if it is preceded with regex [\d{2}.

literal encoded
foo:bar foo[58]bar
foo[58]bar foo[[58]bar
foo[[ foo[[[[
foo[:bar foo[[[58]bar
illegal foo[bar

One downside is that we now use four bytes to represent one byte, which might be a problem with URL addresses. But this is an edge case that maybe we can live with. Also, this is still better than percent-encoding, which uses five bytes for URL paths.

Another thought: I think this new encoding necessitates packages-2 (we use packages-1 as our cache dir right now).

bioball commented 4 months ago

Proposal here: https://github.com/apple/pkl-evolution/pull/3

bioball commented 4 months ago

This works now per https://github.com/apple/pkl/pull/489.

Might need --depth=1 in order for it to work.