gluon-lang / lsp-types

Types for communicating with a language server
MIT License
325 stars 86 forks source link

The new `Uri` type in 0.96.0 is hard to use and have almost no interoperability #284

Open oxalica opened 5 months ago

oxalica commented 5 months ago

In #282, we changed the default URI type from url::Url to fluent_uri::Uri. Despite fluent_uri is several order of magnitude less used (and maybe less tested?) than url, it is also hard to use in the context of LSP servers and clients.

  1. We lacks the conversion method between lsp_types::Uri and fluent_uri::Uri (and also the popular url::Url), making lsp_types::Uri a standalone type which can only be constructed by parsing a literal string through FromStr, which is costly and suffers from escaping traps (also see (2)).
  2. Most of LSP URIs are file:// paths, plus untitled: dangling files. url::Url has convenient Url::to_file_path and Url::from_file_path for error-resistant conversions. But fluent_uri does not provide any relevant methods, and we can only serializing and parsing the path again. Non-UTF-8 and other URL-special characters (#, ? and etc) should also be carefully handled by the user.
  3. I'm not convinced by the reason in #261 for this big change. It complains that .join need some extra steps to work properly, but the crate migration makes it worse: fluent_uri does not provide join at all. The same use-case in that issue will now need to serialize, concatenate and parse again, plus all escaping work mentioned in (2).

I suggest to either:

  1. Revert back to Url, warn user that joining directory URLs should take extra cares. There are only few places that URLs refer to directories rather than files.
  2. Change them to String, fully transferring the URL handling logic to downstream users.

Either of these approaches would provide better usability and interoperability of URLs from and into LSP protocol uses.

archseer commented 5 months ago

Agree, this is a major regression for Helix. We're going to stick to the older package version for the time being.

tooltitude-support commented 4 months ago

Are there any updates on this? It's understandable that the gluon lang is the main motivation for the existence of this crate. The change broke compatibility and for us it's hard to quickly fix it.

P.S. Using String IMO is the best solution.

Myriad-Dreamin commented 4 months ago

I encountered subtle issue when playing with url::Url, so I think url::Url is also not good enough for all cases.

To ensure both extensibility and type safety, Perhaps a struct Uri(String) can be provided. It might be look like:

#[derive(Serialize, Deseriaize)]
pub struct Uri(String);

#[cfg(feature = "rust-url")]
impl From<Uri> for url::Url { todo!() }
Marwes commented 4 months ago

Yes, because url::Url does the wrong thing in some cases we did need to replace it with something. I haven't seen any better candidates than the fluent_uri crate so that really only leaves that crate or just using a plain String type as an alternative. Since fluent_uri can be easily converted to and from String (as_str/FromStr) it seemed best to have at least some helper methods and validations by using that crate.

To ensure both extensibility and type safety, Perhaps a struct Uri(String) can be provided. It might be look like:

Would that really be better than the current situation? There would still not be any of the helper methods that the old type provided, and the type would still be different than before so it would still be a breaking change?

Marwes commented 4 months ago

We lacks the conversion method between lsp_types::Uri and fluent_uri::Uri (and also the popular url::Url), making lsp_types::Uri a standalone type which can only be constructed by parsing a literal string through FromStr, which is costly and suffers from escaping traps (also see (2)).

The newtype is unfortunate but it can hopefully be removed with updates to fluent_url eventually. If there are missing conversions please file a PR.

Most of LSP URIs are file:// paths, plus untitled: dangling files. url::Url has convenient Url::to_file_path and Url::from_file_path for error-resistant conversions. But fluent_uri does not provide any relevant methods, and we can only serializing and parsing the path again. Non-UTF-8 and other URL-special characters (#, ? and etc) should also be carefully handled by the user.

Most is not enough unfortunately, url::Url has incompatibilities with the uris used in the LSP spec so after people found and reported it, it had to go. It is unfortunate that some methods you were relying on were lost in that process but those can at least be restored, whereas the incompatibilities can't be fixed in the url::Url type.

I'm not convinced by the reason in https://github.com/gluon-lang/lsp-types/issues/261 for this big change. It complains that .join need some extra steps to work properly, but the crate migration makes it worse: fluent_uri does not provide join at all. The same use-case in that issue will now need to serialize, concatenate and parse again, plus all escaping work mentioned in (2).

It is not just inconvenicances, the url crate adds extra trailing slashes in some cases https://github.com/gluon-lang/lsp-types/commit/e3d0ed287e85bc453d7f77a7c9e6d39620b8d1ef#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759L2820-L2827 which causes changes in behaviour in the LSP.

Revert back to Url, warn user that joining directory URLs should take extra cares. There are only few places that URLs refer to directories rather than files.

Not possible, url::Url does not match the LSP spec.

Change them to String, fully transferring the URL handling logic to downstream users.

That seems overall worse to me. With plain strings we get no validations and no convenience methods at all. The current type at least gets us validation and some convenience methods and if the conveniences are not enough we are at least able to add those (either to this crate, or fluent_uri`) to build to something better. If that is not enough at this immediate moment you can always convert to/from strings anyway and do your own handling.

oxalica commented 4 months ago

Yes, because url::Url does the wrong thing

wrong thing in some cases we did need to replace it with something

url::Url has incompatibilities with the uris used in the LSP spec

Could you elaborate so we can check them one-by-one? I assume you are just referring to #261 which I'll explain below. I'm using url for a long time and it's fairly standard-compliant to me, if not more than fluent_uri.

Turns out there's a divergence of what's "wrong". According to the RFC3986 section 5.2.3 mentioned in that same issue here:

return a string consisting of the reference's path component appended to all but the last segment of the base URI's path (i.e., excluding any characters after the right-most "/" in the base URI path, or excluding the entire base URI path if it does not contain any "/" characters).

You can also confirm the "correct" behavior from other language's URI/URL implementation, eg. in nodejs 20.12.2:

> new URL('foo', new URL('file:///no/trailing/slash')).toString()
"file:///no/trailing/foo"
> new URL('foo', new URL('file:///no/trailing/slash/')).toString()
"file:///no/trailing/slash/foo" 

Though it's kinda unfortunately, it seems manually adding a trailing slash is necessary. In the LSP Spec's recommended implementation vscode-uri, it has a Utils.joinPath which converting into POSIX path, joining them and converting it back to URI (just like Url::to_file_path, Path::join then Url::from_file_path), instead of using the native URI join. I think they were also aware of this problem and chose not to change the language provided URI-join implementation.

To ensure both extensibility and type safety, Perhaps a struct Uri(String) can be provided. It might be look like:

Would that really be better than the current situation?

To me, letting user parse it using whatever crate they like directly, is better than parsing to an predetermined type (which some users don't like), serializing to string, and parsing to another type again. The current solution is only better if most users can directly use the fluent_uri type, which turns out to be not the case, based on the number of :+1: on this issue.

so after people found and reported it, it had to go.

Note that users like me who don't encounter any issue previously, of course, do not report issues. I guess there's a bias here. We only report after the change, like this one.

It is not just inconvenicances, the url crate adds extra trailing slashes in some cases https://github.com/gluon-lang/lsp-types/commit/e3d0ed287e85bc453d7f77a7c9e6d39620b8d1ef#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759L2820-L2827 which causes changes in behaviour in the LSP. Url::parse("file://test").unwrap()

That's because you missed one slash after the scheme. LSP Spec already mention the format explicitly in here:

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment

file:///c:/project/readme.md
file:///C%3A/project/readme.md

For local machine (empty authority), it should be file:///path. Three slashes.

println!("{}", url::Url::parse("file:///test").unwrap());
> file:///test

Not possible, url::Url does not match the LSP spec.

As I explained above, it seems to match it (and RFC3986) better than fluent_uri.

With plain strings we get no validations and no convenience methods at all.

Validation is a good point, though I don't think it's such important considering we have to trust the peer (either server or client) anyway. Also I don't see convenience methods on fluent_uri either. The only useful method is path() but you also have to manually unescape it.

// An example URI in the LSP Spec, see the link above.
let uri = fluent_uri::Uri::parse("file:///C%3A/project/readme.md").unwrap();
println!("{}", uri.path());
> /C%3A/project/readme.md

It does not look convenient to me.

I agree fluent_uri is a good crate for fast parsing and component extraction, but it's hard to use when you want to construct, store, and access regularly as a generic container.

tooltitude-support commented 4 months ago

@Marwes May be it's possible to parameterize the module with the URL type. I.e. have a feature flag in the module. You could define a type alias and fill it depending on the the feature values.

It will at least allow to postpone migration to different url type for your users.

Timmmm commented 1 month ago

Yeah I'm also finding this very painful. I don't have a high opinion of the url crate (I've barely even used it and I ran into an extremely trivial crash bug that still hasn't been fixed)... So I'm definitely sympathetic to using something better, but I would say there's no way fluent_uri is overall better yet. It's very immature, there are zero convenience methods, the version used by lsp-types doesn't have any way of mutating or constructing URIs, the API changed substantially in the newest version.

lpil commented 1 week ago

Hello all! We have also been unable to upgrade in the Gleam compiler due to lost functionality. Thank you.

ogoffart commented 1 week ago

(We're also holding up the update in Slint)

mkaput commented 1 week ago

Same in the LS for Cairo language.

archseer commented 1 week ago

We've ended up forking the crate (https://github.com/helix-editor/helix/tree/master/helix-lsp-types) but we also plan to make some changes to the fork to better work with the editor internals.