encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
13.17k stars 836 forks source link

Implement SOCKS v4, v5 Proxy #203

Closed sethmlarson closed 2 years ago

sethmlarson commented 5 years ago

Related: #36

sethmlarson commented 5 years ago

So the issue with this is that there's no sans-I/O implementation of the SOCKv4 and SOCKSv5 protocols that doesn't require us to add 3 dependencies. The protocols are so simple that I'm actually in favor of writing our own library that has no dependencies.

yeraydiazdiaz commented 5 years ago

I'm happy to lend a hand on this @sethmlarson. Does it depend on #259?

sethmlarson commented 5 years ago

Thanks @yeraydiazdiaz! :heart:

It'll definitely intersect on the configuration stage on the client but the dispatcher implementation is separate, let's start by getting a sans-I/O implementation of SOCKSv4 and v5 w/o dependencies and go from there.

I actually think that the sans-I/O implementation should be it's own library, maybe on the python-http org, but it can start on one of our personal accounts. Would you like to be the originator or should I create a repo and add you and you can take it from there?

yeraydiazdiaz commented 5 years ago

I'll definitely need some help so it might be easier if it's all setup in non-personal repo from the start 🙂

sethmlarson commented 5 years ago

I pushed the initial commit: https://github.com/sethmlarson/socks Feel free to make massive changes as nothing currently works E2E, it's just a result of me programming a few hours. I've sent collaborator requests to everyone interested. :)

The repo will live under python-http once we release for the first time!

mIcHyAmRaNe commented 4 years ago

while waiting for the implementation, this is a temporary alternative for people who want to use socks4/5 with httpx by using pysocks :

# pip install PySocks

import httpx
import socks
import socket

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket

URL = 'http://ifconfig.me/ip'

with httpx.Client() as client:
    resp = client.get(URL)
    print(resp.text)
romis2012 commented 4 years ago

If it's relevant, there is 3rd party SOCKS implementation: httpx-socks

bbkane commented 4 years ago

It looks like that one doesnt support trio (my favorite async backend).

romis2012 commented 4 years ago

It looks like that one doesnt support trio (my favorite async backend).

Trio support added in version 0.2.0

cdeler commented 4 years ago

@tomchristie

I wrote a small PoC with httpcore + PySocks with requesting example.com through the local socks5 proxy.

Working with the PoC, I added new method to AsyncioBackend named open_socks_stream (I decided that socks4/5 is a transport for us like tcp, ssl or uds).

If this idea (PySocks and a new method in the backends) works for you, I can start adding socks proxies support to httpcore

florimondmanca commented 4 years ago

@cdeler From what I understand, SOCKS is an application-level protocol that sits on top of TCP (well, there's UDP in SOCKS5, but that's not something we should be thinking about for now), so in theory we shouldn't really need a new type of open_* method on concurrency backends.

Also, since I don't think it's been linked to here yet — @yeraydiazdiaz had started a lovely piece of work on HTTPCore already a few months back, based on the socksio library: https://github.com/encode/httpcore/pull/51. Benefits of socksio is that it's sans-I/O, meaning that we can use it either with sync or async, just like h11 and h2 for HTTP/1.1 and HTTP/2.

So perhaps, if anyone's interested, there'd be room for getting that work up to date. I personally think socksio and the sans-I/O approach is our safest bet there if we want to have an as-simple-and-straightforward-as-possible implementation. :-)

cdeler commented 4 years ago

@florimondmanca Whoops, I lost that there is a PR in progress...

On one hand you are right, proxy works on L7, but it is a transport for us...

Well in terms of pysocks, this library provides us with a socket-like interface which wraps a connection

You can check what I've done here: https://github.com/encode/httpcore/pull/186 (I've created a PR just to show an idea)

I'd like to have a chance to check https://github.com/encode/httpcore/pull/51 :-) Thank you for the advice

Update: looks like @yeraydiazdiaz has the same problem as me with https connection

cdeler commented 4 years ago

@florimondmanca you are right, socksio is really better, since it allows us implement the code on connection-level. I closed https://github.com/encode/httpcore/pull/186 (with PySock) and opened https://github.com/encode/httpcore/pull/187 (socksio) draft

cdeler commented 4 years ago

I wonder what we should do with socks4. It enforces us to process nslookup on our side (as socks4 connect do not accept domain names).

Lets imagine that someone wants to access "google.com" through socks4 proxy. What should we do there? To raise a ProxyError with "SOCKS4 protocol do not support domain names as a host address" or make nslookup on our side (does anyone know good nslookup libraries for async?) ?

romis2012 commented 4 years ago

SOCKS4 protocol do not support domain names as a host address

Some socks5 servers also don't support DNS resolving. On the other hand, some socks4 servers support it.

does anyone know good nslookup libraries for async

See how it is implemented in python-socks which httpx-socks is based on. For asyncio backend you can also use aiodns.

KOLANICH commented 3 years ago

https://github.com/python-hyper/hyper/pull/441 may be related (though it is for hyper)

jacklanda commented 3 years ago

while waiting for the implementation, this is a temporary alternative for people who want to use socks4/5 with httpx by using pysocks :

# pip install PySocks

import httpx
import socks
import socket

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket

URL = 'http://ifconfig.me/ip'

with httpx.Client() as client:
    resp = client.get(URL)
    print(resp.text)

Thank u and hope the implementation will come up to soon

balki commented 3 years ago

I wrote a socks library. It has no dependencies and supports sync, async and sans IO usage. I had to write because I want to use over 'unix domain sockets' which none of the other packages supported. (I didn't know about socksio when I wrote but the usage is much simpler). Feel free to depend on it or just copy relevant parts of it.

tomchristie commented 2 years ago

Alrighty - closed via https://github.com/encode/httpcore/pull/478 and https://github.com/encode/httpx/pull/2034 using Seth's fantastic socksio package.

We've only got SOCKS5 in right now. Not obvs to me how much value there would be in 4/4a, or in SOCKS5 with the IP resolved by the client. Probably makes sense to leave a decision on those pending user feedback.

ghost commented 2 years ago

Wow! This is awesome!

socks5h would be very useful for me for Tor .onion support.

Edit: I might be misunderstanding. Is the IP resolved by the socks5 server? So SOCKS5 with client-side resolution is not supported?

tomchristie commented 2 years ago

Is the IP resolved by the socks5 server? So SOCKS5 with client-side resolution is not supported?

Correct. That is the current setup.

tomchristie commented 2 years ago

That seems like something that's a valid enough use-case to open an issue for. You'd be very welcome to do that, and reference this comment.

I had a quick read up about this to educate myself, and found this blog post to be pretty helpful.

ghost commented 2 years ago

That's perfect for me! I had some code using requests that I wanted to port over to httpx, but was holding off for SOCKS support. With requests, I seem to recall that this behavior was enabled by doing socks5h:// and not socks5://. socks5:// assumes the client resolves the IP. Should we mimic this in httpx?

tomchristie commented 2 years ago

That would probably be a nice feature / change in behaviour to have, yup, but it's a little bit involved to implement.

tomchristie commented 2 years ago

Sorry - slow me down I'm being a bit stupid.

Our current behaviour is that the proxy resolves the DNS. (socks5h) What we don't have is support for the client resolving the DNS. (Which would be a bit of a pain to add, but do-able. Tho it's not obvious to me what use-case we'd want to support that for.)