dabeaz / curio

Good Curio!
Other
4.02k stars 241 forks source link

Is gethostbyname thread-safe? #189

Closed milesrout closed 7 years ago

milesrout commented 7 years ago

https://github.com/dabeaz/curio/blob/master/curio/socket.py#L57-L59

Is this thread-safe?

dabeaz commented 7 years ago

It's thread-safe to the extent that the socket.gethostbyname() function is thread-safe in the standard library. Admittedly, that's not much of an answer. I know that some other libraries (e.g., gevent) also hand this function off to a thread pool. There doesn't seem to be any documentation indicating that it's not thread-safe.

milesrout commented 7 years ago

Hmm. Python's socket library says it's a fairly thin wrapper around BSD sockets, and gethostbyname at a C level returns a pointer to static data and so is neither safe to call from multiple threads nor reentrant.

milesrout commented 7 years ago

Actually it looks like Python checks whether gethostbyname is thread-safe on a given platform with some lovely preprocessor macros and if it isn't then it wraps it in a lock. So yeah, it's thread-safe.

dabeaz commented 7 years ago

Given that networking is a major use of threads in Python, the thread-safety of this function seems like it would be pretty important. In all of my time coding Python, I have never heard of it being "unsafe" although I think it would be hard to know without auditing the C code. I'm looking at the code to gevent right now and they have the following comment in their implementation:

 # from briefly reading socketmodule.c, it seems that all of the functions                                                      
 # below are thread-safe in Python, even if they are not thread-safe in C.   
milesrout commented 7 years ago

Awesome, I assumed it must be made thread-safe at some level but I wasn't sure which.

Is running these functions on a separate intended to be a permanent solution, or is it a stopgap until some better solution can be found? A O_NONBLOCK for DNS would be nice.

njsmith commented 7 years ago

Yeah, traditionally gai wasn't thread safe, but I think all the reasonable implementations fixed this a decade or two ago back, as part of the push to support threads in general.

You might enjoy https://emptysqua.re/blog/getaddrinfo-cpython-mac-and-bsd/

True async gai has been the dream of every async programmer for nearly as long, with zero discernable progress. (Except maybe on Windows?) I wouldn't hold your breath. The problem is that gai has substantial hooks for system-specific configuration. If you just want to do DNS lookups then you can get an async DNS library, there are several. But gai might potentially check all kinds of sources, call into arbitrary blocking plugins (on Linux search for "name service switch"), etc., and a true async gai would need to match all that behavior bug-for-bug.

dabeaz commented 7 years ago

Interesting. I've always wondered about a true async gai. Is this really a mirage? Do well established libraries like Twisted have a true async hostname lookup?

njsmith commented 7 years ago

Twisted uses a thread pool to call getaddrinfo.

imrn commented 7 years ago

How about socket.connect()? Should we also think about making the call in a helper thread?

dabeaz commented 7 years ago

The connect() method of sockets supports non-blocking operation and is handled by the Curio I/O polling code.