Closed RobertoPrevato closed 1 year ago
I found that build-in urlsplit looks faster than httptools but doesn't provide error checking and doesn't split netloc by userinfo, domain, port as well as urlparse. So, using urlparse seems make all requests slower. And using urlsplit without additional parsers and format checking can bring other side effects.
Here are some measurements:
❯ python -m timeit -u usec -s "import httptools" -- "httptools.parse_url(b'https://user:password@example.com/users?filter=1#a')"
1000000 loops, best of 5: 0.339 usec per loop
❯ python -m timeit -u usec -s "import urllib.parse" -- "urllib.parse.urlsplit(b'https://user:password@example.com/users?filter=1#a')"
5000000 loops, best of 5: 0.0814 usec per loop
❯ python -m timeit -u usec -s "import urllib.parse" -- "urllib.parse.urlparse(b'https://user:password@example.com/users?filter=1#a')"
200000 loops, best of 5: 1.83 usec per loop
In [6]: httptools.parse_url(b'https://user:password@example.com/users?filter=1#a')
Out[6]: <URL schema: b'https', host: b'example.com', port: None, path:
b'/users', query: b'filter=1', fragment: b'a', userinfo: b'user:password'>
In [2]: urllib.parse.urlsplit(b'https://user:password@example.com/users?filter=1#a')
Out[2]: SplitResultBytes(scheme=b'https', netloc=b'user:password@example.com', path=b'/users', query=b'filter=1', fragment=b'a')
In [2]: urllib.parse.urlparse(b'https://user:password@example.com/users?filter=1#a')
Out[2]: ParseResultBytes(scheme=b'https', netloc=b'user:password@example.com', path=b'/users', params=b'', query=b'filter=1', fragment=b'a')
Thanks for the insight on this! By the way, I forgot to write an update here. The information I wrote above is not current anymore.
Request
and Response
classes wouldn´t have to be bi-directional anymore), but I changed mind when I measure the performance difference between my HTTP Client implementation and httpx
.https://twitter.com/RobertoPrevato/status/1596212734575747073
https://github.com/RobertoPrevato/temp-httpx-tests
Therefore the HTTP Client is staying in BlackSheep, and httptools won´t be easily replaced since it's used so much in the client. Moreover: I added several improvements lately and corrected bugs. I wrote a crawler recently with it and it works very well.
Try replacing httptools.parse_url with Rust extension.
Why?
Because if we removed the HTTP Client implementation (I plan to do so in version 2), then
httptools
would only be used to parse URLs. It could be replaced by built-in URL parsing, or also as learning opportunity, by a small wrapper over a Rust library that can parse URLs.