aio-libs / aiodns

Simple DNS resolver for asyncio
https://pypi.python.org/pypi/aiodns
MIT License
532 stars 69 forks source link

Queries intermittently freezing asyncio event loop #122

Open davidmcnabnz opened 5 months ago

davidmcnabnz commented 5 months ago

Most of the time, aiodns is fine. But on rare occasions, it gets stuck on a C write() call deep within pycares.

This freezes the entire event loop indefinitely, because the write() call never returns.

My original calling code is like:

dns = aiodns.DNSResolver()
reply = await dns.query(somedomain, 'MX')

For now, I'll look at workarounds like moving all my aiodns queries off to separate threads, but this seems to be inefficient.

But I'd welcome some advice on this.

Below is the py-spy stack trace of where the aiodns call is getting stuck.

Thread 380494 (idle): "MainThread"
    write (libpthread-2.31.so)
    _Py_DECREF (object.h:422)
    _my_PyErr_WriteUnraisable (_cffi_backend.c:6113)
    general_invoke_callback (_cffi_errors.h:147)
    gil_release (misc_thread_common.h:370)
    cffi_call_python (call_python.c:278)
    _sock_state_cb (_cares.c:998)
    open_udp_socket (ares_process.c:1240)
    ares__send_query (ares_process.c:854)
    ares_send (ares_send.c:131)
    ares_query (ares_query.c:138)
    _cffi_f_ares_query (_cares.c:3287)
    _do_query (pycares/__init__.py:581)
    query (pycares/__init__.py:561)
    query (aiodns/__init__.py:90)
davidmcnabnz commented 5 months ago

The nature of this issue means that using asyncio timeout wrappers cannot work, because once the thread's event loop is stuck inside a C function call, there's no way for a TimeoutError to get thrown up to the wrapper.

davidmcnabnz commented 5 months ago

I've also filed an issue with the pycares tracker:

saghul commented 5 months ago

What a weird one!

Drilling down, what happens is pycares got some activity on a file descriptor and called the socket state callback, which aiodns uses:

https://github.com/saghul/aiodns/blob/1c5f28f8700a9c45c0ee0e3ee04a1e5bdde7fd8c/aiodns/__init__.py#L137

Here is where pycares calls is: https://github.com/saghul/pycares/blob/de2ed40596f543f989bbcea30632be751133c110/src/pycares/__init__.py#L97

Something seems to happen which causes an unraiseable error: _my_PyErr_WriteUnraisable (_cffi_backend.c:6113) and then it's the call to wirte it to standard out which seemingly gets stuck.

Very weird.

On the pycares issue you seem to be using 4.2 which is an older release. Can you please test with the latest version of both packages?

Also, a repro script, even if it takes ours would be useful.