dhall-lang / dhall-haskell

Maintainable configuration files
https://dhall-lang.org/
BSD 3-Clause "New" or "Revised" License
908 stars 211 forks source link

Problem with space character in URL import. #2467

Closed 1chb closed 1 year ago

1chb commented 1 year ago

A URL with a space encoded as %20 is changed to %2520 so the URL misses its target. Maybe there is no coincident that the % character has ASCII code 0x25. To reproduce:

> dhall repl
Welcome to the Dhall v1.41.2 REPL! Type :help for more information.
⊢ https://github.com/dhall-lang/dhall-haskell/blob/master/dhall/examples/fi%20le.dhall

Error: Remote file not found

HTTP status code: 404

URL: https://github.com/dhall-lang/dhall-haskell/blob/master/dhall/examples/fi%2520le.dhall
sjakobi commented 1 year ago

Can you provide a reproducer with an import that is actually supposed to work?

https://github.com/dhall-lang/dhall-haskell/blob/master/dhall/examples/fi%20le.dhall doesn't exist so it's no surprise the import fails.

1chb commented 1 year ago

Sorry for my bad example, it just shows how Dhall changes the URL in a bad way, but here follows some existing files to import.

A file without space works: https://github.com/1chb/dhall/blob/main/file.dhall as Text

But not this one: https://github.com/1chb/dhall/blob/main/the%20file.dhall as Text

Note that %20 is changed to %2520. I think I understand why. When Dhall sees the % in the URL, which is a special token that needs to be escaped, it changes it to %25 (hex 25 is the ASCII code for %).

I was thinking that maybe this is not a bug, but a feature so I don't have to escape my special tokens myself. However, it is not very usable because if I have a file: the%file.dhall, I can't import it like so:

https://github.com/1chb/dhall/blob/main/the%file.dhall as Text

Error: Invalid input

(input):1:7:
  |
1 | https://github.com/1chb/dhall/blob/main/the%file.dhall as Text
  |       ^^
unexpected "//"
expecting whitespace

Dhall's parser doesn't recognize a % without two following hex digits and consequently doesn't recognize that this is a URL. This is also a problem with all other special characters that the parser don't expect to see in the URL, e.g. space or comma.

However, if I have a file the%20file.dhall, it can actually be imported with: https://github.com/1chb/dhall/blob/main/the%20file.dhall as Text

Because Dhall changes the file name to the%2520file.dhall and the server changes that back to the%20file.dhall.

A surprising consequence:

https://github.com/1chb/dhall/blob/main/the%file.dhall as Text -- Invalid input
https://github.com/1chb/dhall/blob/main/the%feli.dhall as Text -- Works

I think, Dhall should not %-escape the URL at all, but leave that to the programmer.

Optional: (If the quoted path components were allowed for URLs as well, Dhall, could remove the quotes and %-escape their content, e.g. http.../"the file.dhall" becomes http.../the%20file.dhall.)

Gabriella439 commented 1 year ago

Yeah, so according to the language standard the Haskell implementation is non-standard-conforming. What should have happened is that an import of the form https://github.com/1chb/dhall/blob/main/the%20file.dhall should have been resolved as-is without percent-encoding the %20.

See also:

So the fix here is to change the Haskell implementation to conform to the standard.

Gabriella439 commented 1 year ago

The fix is up here: https://github.com/dhall-lang/dhall-haskell/pull/2505