Closed sethmlarson closed 2 years ago
So the issue with this is that there's no sans-I/O implementation of the SOCKv4 and SOCKSv5 protocols that doesn't require us to add 3 dependencies. The protocols are so simple that I'm actually in favor of writing our own library that has no dependencies.
I'm happy to lend a hand on this @sethmlarson. Does it depend on #259?
Thanks @yeraydiazdiaz! :heart:
It'll definitely intersect on the configuration stage on the client but the dispatcher implementation is separate, let's start by getting a sans-I/O implementation of SOCKSv4 and v5 w/o dependencies and go from there.
I actually think that the sans-I/O implementation should be it's own library, maybe on the python-http org, but it can start on one of our personal accounts. Would you like to be the originator or should I create a repo and add you and you can take it from there?
I'll definitely need some help so it might be easier if it's all setup in non-personal repo from the start 🙂
I pushed the initial commit: https://github.com/sethmlarson/socks Feel free to make massive changes as nothing currently works E2E, it's just a result of me programming a few hours. I've sent collaborator requests to everyone interested. :)
The repo will live under python-http once we release for the first time!
while waiting for the implementation, this is a temporary alternative for people who want to use socks4/5 with httpx by using pysocks :
# pip install PySocks
import httpx
import socks
import socket
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket
URL = 'http://ifconfig.me/ip'
with httpx.Client() as client:
resp = client.get(URL)
print(resp.text)
If it's relevant, there is 3rd party SOCKS implementation: httpx-socks
It looks like that one doesnt support trio
(my favorite async backend).
It looks like that one doesnt support trio (my favorite async backend).
Trio support added in version 0.2.0
@tomchristie
I wrote a small PoC with httpcore
+ PySocks
with requesting example.com
through the local socks5 proxy.
Working with the PoC, I added new method to AsyncioBackend
named open_socks_stream
(I decided that socks4/5
is a transport for us like tcp, ssl or uds).
If this idea (PySocks
and a new method in the backends) works for you, I can start adding socks proxies support to httpcore
@cdeler From what I understand, SOCKS is an application-level protocol that sits on top of TCP (well, there's UDP in SOCKS5, but that's not something we should be thinking about for now), so in theory we shouldn't really need a new type of open_*
method on concurrency backends.
Also, since I don't think it's been linked to here yet — @yeraydiazdiaz had started a lovely piece of work on HTTPCore already a few months back, based on the socksio
library: https://github.com/encode/httpcore/pull/51. Benefits of socksio
is that it's sans-I/O, meaning that we can use it either with sync or async, just like h11
and h2
for HTTP/1.1 and HTTP/2.
So perhaps, if anyone's interested, there'd be room for getting that work up to date. I personally think socksio
and the sans-I/O approach is our safest bet there if we want to have an as-simple-and-straightforward-as-possible implementation. :-)
@florimondmanca Whoops, I lost that there is a PR in progress...
On one hand you are right, proxy works on L7, but it is a transport for us...
Well in terms of pysocks
, this library provides us with a socket
-like interface which wraps a connection
You can check what I've done here: https://github.com/encode/httpcore/pull/186 (I've created a PR just to show an idea)
I'd like to have a chance to check https://github.com/encode/httpcore/pull/51 :-) Thank you for the advice
Update: looks like @yeraydiazdiaz has the same problem as me with https connection
@florimondmanca you are right, socksio
is really better, since it allows us implement the code on connection-level. I closed https://github.com/encode/httpcore/pull/186 (with PySock
) and opened https://github.com/encode/httpcore/pull/187 (socksio
) draft
I wonder what we should do with socks4
. It enforces us to process nslookup
on our side (as socks4
connect do not accept domain names).
Lets imagine that someone wants to access "google.com"
through socks4 proxy. What should we do there? To raise a ProxyError
with "SOCKS4 protocol do not support domain names as a host address" or make nslookup
on our side (does anyone know good nslookup libraries for async
?) ?
SOCKS4 protocol do not support domain names as a host address
Some socks5 servers also don't support DNS resolving. On the other hand, some socks4 servers support it.
does anyone know good nslookup libraries for async
See how it is implemented in python-socks which httpx-socks is based on. For asyncio backend you can also use aiodns.
https://github.com/python-hyper/hyper/pull/441 may be related (though it is for hyper)
while waiting for the implementation, this is a temporary alternative for people who want to use socks4/5 with httpx by using pysocks :
# pip install PySocks import httpx import socks import socket socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050) socket.socket = socks.socksocket URL = 'http://ifconfig.me/ip' with httpx.Client() as client: resp = client.get(URL) print(resp.text)
Thank u and hope the implementation will come up to soon
I wrote a socks library. It has no dependencies and supports sync, async and sans IO usage. I had to write because I want to use over 'unix domain sockets' which none of the other packages supported. (I didn't know about socksio when I wrote but the usage is much simpler). Feel free to depend on it or just copy relevant parts of it.
Alrighty - closed via https://github.com/encode/httpcore/pull/478 and https://github.com/encode/httpx/pull/2034 using Seth's fantastic socksio
package.
We've only got SOCKS5 in right now. Not obvs to me how much value there would be in 4/4a, or in SOCKS5 with the IP resolved by the client. Probably makes sense to leave a decision on those pending user feedback.
Wow! This is awesome!
socks5h would be very useful for me for Tor .onion support.
Edit: I might be misunderstanding. Is the IP resolved by the socks5 server? So SOCKS5 with client-side resolution is not supported?
Is the IP resolved by the socks5 server? So SOCKS5 with client-side resolution is not supported?
Correct. That is the current setup.
That seems like something that's a valid enough use-case to open an issue for. You'd be very welcome to do that, and reference this comment.
I had a quick read up about this to educate myself, and found this blog post to be pretty helpful.
That's perfect for me! I had some code using requests
that I wanted to port over to httpx
, but was holding off for SOCKS support. With requests
, I seem to recall that this behavior was enabled by doing socks5h://
and not socks5://
. socks5://
assumes the client resolves the IP. Should we mimic this in httpx
?
That would probably be a nice feature / change in behaviour to have, yup, but it's a little bit involved to implement.
Sorry - slow me down I'm being a bit stupid.
Our current behaviour is that the proxy resolves the DNS. (socks5h
) What we don't have is support for the client resolving the DNS. (Which would be a bit of a pain to add, but do-able. Tho it's not obvious to me what use-case we'd want to support that for.)
Related: #36