eclipse-lsp4j / lsp4j

A Java implementation of the language server protocol intended to be consumed by tools and language servers implemented in Java.
https://eclipse.org/lsp4j
Other
599 stars 143 forks source link

Normalize URI #692

Open BoykoAlex opened 1 year ago

BoykoAlex commented 1 year ago

The common case of setting URI on lsp4j message object is URI#toString() call. For example LSP4E sets uri on the TextDocumentIdentifier via URI#toString() call rather than URI#toASCIIString(). Therefore if the file name is ClassWithSpécialCharacter.java then only URI#toASCIIString() would produce the valid URI with escaped non-ASCII char é

It'd be great if TextDocumentIdentifier (and perhaps other classes) taking string URIs would normalize the input URI to guard against the issue described above.

nixel2007 commented 1 year ago

In bsl language server we've created a helper class for handling non-canonical uri: https://github.com/1c-syntax/utils/blob/master/src/main/java/com/github/_1c_syntax/utils/Absolute.java

BoykoAlex commented 1 year ago

Yes, we could do this as well (thanks for the sharing your solution!) but this would require a sweep kind of change, ie. URI.create(strUri) vs Absolute.uri(strUri). The point is to try to avoid a code sweep change if possible... sort of to guard against further use of URI#toString() (or URI.create(String) vs Absolute.uri(String))

jonahgraham commented 1 year ago

@BoykoAlex Are you recommending adding a constructor to TextDocumentIdentifier that takes a java.net.URI (in addition to a String)? Or do you want TextDocumentIdentifier to escape the input string? Can the latter part be done reliably?

BoykoAlex commented 1 year ago

@jonahgraham I'd add a constructor that takes java.net.URI which would call uri.toASCIIString() to set the string uri field. In addition I'd make the constructor taking String creating a java.net.URI from string and then delegate to the constructor taking java.net.URI. It'd be great to do the same to setUri(...) unless this seems risky...

jonahgraham commented 1 year ago

@jonahgraham I'd add a constructor that takes java.net.URI which would call uri.toASCIIString() to set the string uri field.

This part seems fine.

In addition I'd make the constructor taking String creating a java.net.URI from string and then delegate to the constructor taking java.net.URI.

This part worries me because of double-escaping. But then looking at your above comment that would still need a sweeping change if we didn't adopt this part.

It'd be great to do the same to setUri(...) unless this seems risky...

I don't see any issue with adding a new setUri(java.net.URI) method.

However having a getUri... that returns a java.net.URI seems problematic because its constructor throws URISyntaxException.

I guess the real question is why not have TextDocumentIdentifier store a java.net.URI all the time? Is there some things that LSP URIs can store (as a string) that can't be represented properly by java.net.URI?

BoykoAlex commented 1 year ago

I think having java.net.URI rather than String would be ideal... Somehow, I suspect it is String due to perhaps some JSON serialization/deserialization limitations?