adamreichold / zeptohttpc

Proving ground for changes to attohttpc
Apache License 2.0
11 stars 0 forks source link

fails with "HTTP invalid URI: invalid format" error on some websites #6

Open Shnatsel opened 3 years ago

Shnatsel commented 3 years ago

On some websites, e.g. http://volvoforums.com, zeptohttpc v0.2.1 fails with the following error:

HTTP invalid URI: invalid format

Firefox and curl work fine.

1575 websites out of the top million from Feb 3 Tranco list are affected.

Tested using this code. Test tool output from all affected websites: zepto-invalid-uri.tar.gz

adamreichold commented 3 years ago

This seems to http::uri::Uri being unable to parse relative paths with a trailing slash, c.f. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=79f28833ba458372096f670fd43dfba9

As shown in the playground, url::Url does parse this succesfully but not using that dependency was a deliberate decision that I would not like to drop.

To be honest, I am not verse in the URL spec to determine whether this is actually invalid ATM. But maybe this should be reported against the http create?

adamreichold commented 3 years ago

It seems that "forum".parse::<Uri>() does not parse this as a URI without scheme or authority either:

[tests/path.rs:5] "forum".parse::<Uri>().unwrap().into_parts() = Parts {
    scheme: None,
    authority: Some(
        forum,
    ),
    path_and_query: None,
    _priv: (),
}
adamreichold commented 3 years ago

This might be related to https://github.com/hyperium/http/issues/469

adamreichold commented 2 years ago

I noticed that reqwest is able to handle these, so I checked the code. It seems they do a join, instead of trying to parse the header value directly:

Indeed. The problem - if one wants to call it that - is that this would require adding the url crate as a dependency here. Or at least reproduce the relevant parts of Url::join.