Try replacing httptools.parse_url with Rust URL extension

RobertoPrevato commented 2 years ago

Try replacing httptools.parse_url with Rust extension.

Why?

Because if we removed the HTTP Client implementation (I plan to do so in version 2), then httptools would only be used to parse URLs. It could be replaced by built-in URL parsing, or also as learning opportunity, by a small wrapper over a Rust library that can parse URLs.

tyzhnenko commented 1 year ago

I found that build-in urlsplit looks faster than httptools but doesn't provide error checking and doesn't split netloc by userinfo, domain, port as well as urlparse. So, using urlparse seems make all requests slower. And using urlsplit without additional parsers and format checking can bring other side effects.

Here are some measurements:

❯ python -m timeit -u usec -s "import httptools" -- "httptools.parse_url(b'https://user:password@example.com/users?filter=1#a')"
1000000 loops, best of 5: 0.339 usec per loop

❯ python -m timeit -u usec -s "import urllib.parse" -- "urllib.parse.urlsplit(b'https://user:password@example.com/users?filter=1#a')"
5000000 loops, best of 5: 0.0814 usec per loop

❯ python -m timeit -u usec -s "import urllib.parse" -- "urllib.parse.urlparse(b'https://user:password@example.com/users?filter=1#a')"
200000 loops, best of 5: 1.83 usec per loop

In [6]: httptools.parse_url(b'https://user:password@example.com/users?filter=1#a')
Out[6]: <URL schema: b'https', host: b'example.com', port: None, path: 
b'/users', query: b'filter=1', fragment: b'a', userinfo: b'user:password'>

In [2]: urllib.parse.urlsplit(b'https://user:password@example.com/users?filter=1#a')
Out[2]: SplitResultBytes(scheme=b'https', netloc=b'user:password@example.com', path=b'/users', query=b'filter=1', fragment=b'a')

In [2]: urllib.parse.urlparse(b'https://user:password@example.com/users?filter=1#a')
Out[2]: ParseResultBytes(scheme=b'https', netloc=b'user:password@example.com', path=b'/users', params=b'', query=b'filter=1', fragment=b'a')

RobertoPrevato commented 1 year ago

Thanks for the insight on this! By the way, I forgot to write an update here. The information I wrote above is not current anymore.

I was planning to remove the HTTP Client implementation in BlackSheep to have less things to maintain and possibly simplify the server implementation (as Request and Response classes wouldn´t have to be bi-directional anymore), but I changed mind when I measure the performance difference between my HTTP Client implementation and httpx.

https://twitter.com/RobertoPrevato/status/1596212734575747073

https://github.com/RobertoPrevato/temp-httpx-tests

Therefore the HTTP Client is staying in BlackSheep, and httptools won´t be easily replaced since it's used so much in the client. Moreover: I added several improvements lately and corrected bugs. I wrote a crawler recently with it and it works very well.

Neoteroi / BlackSheep

Try replacing httptools.parse_url with Rust URL extension #281