Neoteroi / BlackSheep

Fast ASGI web framework for Python
https://www.neoteroi.dev/blacksheep/
MIT License
1.88k stars 77 forks source link

Try replacing httptools.parse_url with Rust URL extension #281

Closed RobertoPrevato closed 1 year ago

RobertoPrevato commented 2 years ago

Try replacing httptools.parse_url with Rust extension.

Why?

Because if we removed the HTTP Client implementation (I plan to do so in version 2), then httptools would only be used to parse URLs. It could be replaced by built-in URL parsing, or also as learning opportunity, by a small wrapper over a Rust library that can parse URLs.

tyzhnenko commented 1 year ago

I found that build-in urlsplit looks faster than httptools but doesn't provide error checking and doesn't split netloc by userinfo, domain, port as well as urlparse. So, using urlparse seems make all requests slower. And using urlsplit without additional parsers and format checking can bring other side effects.

Here are some measurements:

❯ python -m timeit -u usec -s "import httptools" -- "httptools.parse_url(b'https://user:password@example.com/users?filter=1#a')"
1000000 loops, best of 5: 0.339 usec per loop

❯ python -m timeit -u usec -s "import urllib.parse" -- "urllib.parse.urlsplit(b'https://user:password@example.com/users?filter=1#a')"
5000000 loops, best of 5: 0.0814 usec per loop

❯ python -m timeit -u usec -s "import urllib.parse" -- "urllib.parse.urlparse(b'https://user:password@example.com/users?filter=1#a')"
200000 loops, best of 5: 1.83 usec per loop
In [6]: httptools.parse_url(b'https://user:password@example.com/users?filter=1#a')
Out[6]: <URL schema: b'https', host: b'example.com', port: None, path: 
b'/users', query: b'filter=1', fragment: b'a', userinfo: b'user:password'>

In [2]: urllib.parse.urlsplit(b'https://user:password@example.com/users?filter=1#a')
Out[2]: SplitResultBytes(scheme=b'https', netloc=b'user:password@example.com', path=b'/users', query=b'filter=1', fragment=b'a')

In [2]: urllib.parse.urlparse(b'https://user:password@example.com/users?filter=1#a')
Out[2]: ParseResultBytes(scheme=b'https', netloc=b'user:password@example.com', path=b'/users', params=b'', query=b'filter=1', fragment=b'a')
RobertoPrevato commented 1 year ago

Thanks for the insight on this! By the way, I forgot to write an update here. The information I wrote above is not current anymore.

https://twitter.com/RobertoPrevato/status/1596212734575747073

https://github.com/RobertoPrevato/temp-httpx-tests

Therefore the HTTP Client is staying in BlackSheep, and httptools won´t be easily replaced since it's used so much in the client. Moreover: I added several improvements lately and corrected bugs. I wrote a crawler recently with it and it works very well.