apple / pkl

A configuration as code language with rich validation and tooling.
https://pkl-lang.org
Apache License 2.0
10.12k stars 265 forks source link

Glob reading with the `file:` schema fails on some paths #582

Open thomaspurchas opened 1 month ago

thomaspurchas commented 1 month ago

The handling of files reads using read* is a bit broken at the moment. Behaviour changes depending on whether or not the file: schema is used, and behaviour in a pkl vs the REPL doesn't always seem consistent. Additionally there are a number of simple glob patterns that cause NullPointerExceptionss or similar incorrect behaviour.

Pkl REPL:

An unexpected error has occurred. Would you mind filing a bug report?
Cmd+Double-click the link below to open an issue.
Please copy and paste the entire error output into the issue's description, provided you can share it.

https://github.com/apple/pkl/issues/new

java.lang.NullPointerException

–– Pkl Error ––
None (cause has no message)

1 | read*("file:**/*.txt")
    ^^^^^^^^^^^^^^^^^^^^^^
at  (repl:pkl0)

Pkl 0.27.0-dev+7917ddb0c (macOS 15.0, native)

java.lang.NullPointerException
    at org.pkl.core.util.GlobResolver.splitGlobPatternIntoBaseAndWildcards(GlobResolver.java:452)
    at org.pkl.core.util.GlobResolver.resolveGlob(GlobResolver.java:483)

This error is is caused by the unsafe assumption that .getPath always returns a value. We should probably be using . getSchemeSpecificPart instead.

Evaling the following will work

a = read*("**/*.txt")

but not (addition of the file: schema)

a = read*("file:**/*.txt")

and running read*("**/*.txt") in the REPL results in

read*("**/*.txt")
–– Pkl Error ––
No resource reader is registered for scheme null.

1 | read*("**/*.txt")
    ^^^^^^^^^^^^^^^^^
at  (repl:pkl0)
bioball commented 1 month ago

The underlying issue here is that file: URIs are hierarchical; it's invalid for a file URI to not have a hierarchical path part (a path that starts with /); see RFC-8089.

Right now, read("file:foo.txt") will cause Pkl to read foo.txt relative to the process CWD, but file:foo.txt is not a valid file URI. This might be changed to a throw in the future. For similar reasons, read*("file:*") is also not supported, and would also be changed to a throw.

When globbing using a file scheme, the path part must be a hierarchical path. For example, read*("file:/path/to/*.txt") (or read("file:///path/to/*.txt" for a more canonical representation; file URIs typically have a host section.)

thomaspurchas commented 1 month ago

In that situation, how are we meant to use read() with a relative path, in particular, how can we use read() with a relative in the REPL where the schema is mandatory?

Also these nuances aren't described in the docs. The docs mention the file: scheme, and under the Globbed Read section provides examples of using read without the file: scheme. But I can't see anywhere that explicitly states relative reads shouldn't use the file: scheme.

Also for the read API, do we have to stick so closely to the RFC? Obviously we should correctly handle any RFC-8089 valid URI. But there's no reason we can't also accept technically invalid URI, but where the intent is still obvious and useful (e.g. using the file: scheme with a relative path)

bioball commented 1 month ago

I don't see a strong need for relative paths with file URIs. If this is declared from within a file-based module, it should use a relative path without the scheme part. If declared within a non file-based module (e.g. from a package), what should the path be relative to? It can't be relative to the enclosing module, and also shouldn't be relative to the CWD, because we want imports to be statically analyzable.

In particular, how can we use read() with a relative in the REPL where the schema is mandatory

I think this is a tangential but definitely a problem. Ideally, you should be able to use relative file imports from within the REPL.

thomaspurchas commented 1 month ago

That sounds reasonable. I guess then my issue is that docs aren't very clear. It's not obvious that using read() with file: and without are different. It also creates a slightly odd situation where file reading has a "special" status, in that it supports relative reads is you don't use the file: scheme, but no other scheme has that capability. Although it doesn't really make sense for other schemes to support relative paths in any situation, it still creates a slightly inconsistent read() API, that's a little surprising.

bioball commented 1 month ago

Agree that we should improve our docs here. We have a section on relative paths, but it's easily missed and a common source of confusion.