UniFormal / MMT

The MMT Language and System
https://uniformal.github.io/
Other
68 stars 23 forks source link

File.unapply for URIs buggy #448

Closed ComFreek closed 5 years ago

ComFreek commented 5 years ago

Consider:

https://github.com/UniFormal/MMT/blob/418486100afc8827029c656c69e7d7eb9d49e564/src/mmt-api/src/main/info/kwarc/mmt/api/utils/File.scala#L170-L175

Apparently, file URIs like file:///C:/Users/..., file://localhost/C:/Users/... are both equally valid (see this comment: "Opera for Windows expands it to file://localhost/D:/Desktop/ automatically.")

My gut feeling says that we should only do if (u.scheme.contains("file")) and consider URIs with an empty scheme as invalid. Opinions?

ComFreek commented 5 years ago

This is probably more complicated than assumed. The authoritative reference on file URIs is RFC 8089, which allows among others:

Indeed it says in section 2 that the current behavior of MMT is incorrect:

As a special case, the "file-auth" rule can match the string "localhost" that is interpreted as "the machine from which the URI is being interpreted," exactly as if no authority were present. Some current usages of the scheme incorrectly interpret all values in the authority of a file URI, including "localhost", as non-local. Yet others interpret any value as local, even if the "host" does not resolve to the local machine. To maximize compatibility with previous specifications, users MAY choose to include an "auth-path" with no "file-auth" when creating a URI.

See also my comments at

lambdaTotoro commented 5 years ago

Yes, the current behaviour is definitely incorrect (where "incorrect" means "not compliant with the standard") and should be changed, imo. From what I can tell, the most glaring inconsistencies between behaviour and standard are the following:

  1. URI objects with empty schemes should not be treated as valid (see the BNF in Section 2 of RFC 8089)
  2. URI objects should not require the authority to be empty or "", since the string localhost is explicitly allowed here.

Compliance with every variation encountered in the wild (see Appendix F of 8089) or even just the normal standard will be difficult since you could have arbitrary authorities once you allow for network drives, but I think we could do a much more accurate job with just the small changes mentioned above.

My question: Did I miss anything major in the list?

P.S.: I've now read the word "file" so often that it looks like a nonsense-word.

lambdaTotoro commented 5 years ago

I've created a pull request that fixes the above two problems. I don't want to merge it myself so short before crucial deadlines but whenever @florian-rabe has a minute, this could be easily merged.

florian-rabe commented 5 years ago

Empty scheme must be allowed to handle relative file URIs (independent of what RFC 8089 says).

Treating localhost the same as absent authority is fine.

lambdaTotoro commented 5 years ago

Could you elaborate on this a bit? A Uniform Resource Identifier that is also relative only type-errors my brain. If you have a relative path to a file, you can make an URI from it by converting it to an absolute path, but from all I understand you can't have relative paths in URIs.

lambdaTotoro commented 5 years ago

Ping @florian-rabe

lambdaTotoro commented 5 years ago

Florian insists on allowing empty schemes on account of URIs allowing to omit prefixes from the left. I have changed the pull request accordingly. Closing.