aio-libs / aiodns

Simple DNS resolver for asyncio
https://pypi.python.org/pypi/aiodns
MIT License
538 stars 69 forks source link

Fatal Python error: Segmentation fault #62

Closed dosyoyas closed 5 years ago

dosyoyas commented 5 years ago

Not really sure if it's an issue with aiodns or pycares. I'm using aiodns to resolve ~300k domains/hour and it works great most of the time, but I'm getting "Fatal Python error: Segmentation fault" from time to time, and it doesn't seem to be related to the load.

Fatal Python error: Segmentation fault
 Current thread 0x00007f23a2ffd700 (most recent call first):
   File "/task/python3/lib/python3.7/site-packages/pycares/__init__.py", line 519 in _do_query
   File "/task/python3/lib/python3.7/site-packages/pycares/__init__.py", line 505 in query
   File "/task/python3/lib/python3.7/site-packages/aiodns/__init__.py", line 79 in query
   File "/task/batch/ResolverBatch.py", line 24 in query
   File "/usr/lib/python3.7/asyncio/events.py", line 88 in _run
   File "/usr/lib/python3.7/asyncio/base_events.py", line 1775 in _run_once
   File "/usr/lib/python3.7/asyncio/base_events.py", line 539 in run_forever
   File "/task/batch/ResolverBatch.py", line 29 in resolver_worker
   File "/usr/lib/python3.7/threading.py", line 865 in run
   File "/usr/lib/python3.7/threading.py", line 917 in _bootstrap_inner
   File "/usr/lib/python3.7/threading.py", line 885 in _bootstrap

Asyncio loop is running in a separate thread and queries are executed using asyncio.run_coroutine_threadsafe(query(domain, 'A'), loop):

loop = asyncio.get_event_loop()
resolver = DNSResolver(loop=loop, nameservers=pDNS)

async def query(name, query_type):
    return await resolver.query(name, query_type)

def resolver_worker(loop):
    """ Switch to new event loop and run forever """
    asyncio.set_event_loop(loop)
    loop.run_forever()

def start_worker():
    """ Start worker thread """
    log.info(f"Starting resolver thread")
    worker = Thread(target=resolver_worker, args=(loop, ))
    worker.setDaemon(True)
    worker.start()

def run_resolver(testing=False):
    log.info(f"Starting Resolver with nameserver {pDNS}")
    start_worker()
    queue = Queue('queue'))
    for domains in queue.recv(forever=True):
        for domain in domains:
            future = asyncio.run_coroutine_threadsafe(query(domain, 'A'), loop)
            if testing:
                return future.result()

Package versions:


        "aiodns": {
            "hashes": [
                "sha256:815fdef4607474295d68da46978a54481dd1e7be153c7d60f9e72773cd38d77d",
                "sha256:aaa5ac584f40fe778013df0aa6544bf157799bd3f608364b451840ed2c8688de"
            ],
            "index": "pypi",
            "version": "==2.0.0"
        },
        "asyncio": {
            "hashes": [
                "sha256:83360ff8bc97980e4ff25c964c7bd3923d333d177aa4f7fb736b019f26c7cb41",
                "sha256:b62c9157d36187eca799c378e572c969f0da87cd5fc42ca372d92cdb06e7e1de",
                "sha256:c46a87b48213d7464f22d9a497b9eef8c1928b68320a2fa94240f969f6fec08c",
                "sha256:c4d18b22701821de07bd6aea8b53d21449ec0ec5680645e5317062ea21817d2d"
            ],
            "index": "pypi",
            "version": "==3.4.3"
        "pycares": {
            "hashes": [
                "sha256:2ca080db265ea238dc45f997f94effb62b979a617569889e265c26a839ed6305",
                "sha256:6f79c6afb6ce603009db2042fddc2e348ad093ece9784cbe2daa809499871a23",
                "sha256:70918d06eb0603016d37092a5f2c0228509eb4e6c5a3faacb4184f6ab7be7650",
                "sha256:755187d28d24a9ea63aa2b4c0638be31d65fbf7f0ce16d41261b9f8cb55a1b99",
                "sha256:7baa4b1f2146eb8423ff8303ebde3a20fb444a60db761fba0430d104fe35ddbf",
                "sha256:90b27d4df86395f465a171386bc341098d6d47b65944df46518814ae298f6cc6",
                "sha256:9e090dd6b2afa65cb51c133883b2bf2240fd0f717b130b0048714b33fb0f47ce",
                "sha256:a11b7d63c3718775f6e805d6464cb10943780395ab042c7e5a0a7a9f612735dd",
                "sha256:b253f5dcaa0ac7076b79388a3ac80dd8f3bd979108f813baade40d3a9b8bf0bd",
                "sha256:c7f4f65e44ba35e35ad3febc844270665bba21cfb0fb7d749434e705b556e087",
                "sha256:cdb342e6a254f035bd976d95807a2184038fc088d957a5104dcaab8be602c093",
                "sha256:cf08e164f8bfb83b9fe633feb56f2754fae6baefcea663593794fa0518f8f98c",
                "sha256:df9bc694cf03673878ea8ce674082c5acd134991d64d6c306d4bd61c0c1df98f"
            ],
            "version": "==3.0.0"
        },```
saghul commented 5 years ago

Are you able to capture a backtrace?

dosyoyas commented 5 years ago

I'll try but I'm not sure if I'll be able to reproduce locally. The above stuff was what I got from faulthandler for the current thread.

saghul commented 5 years ago

If you enable core dumping with ulimit -c unlimited I think a core file will be generated which we can inspect with gdb and see if the trace is meaningful.

dosyoyas commented 5 years ago

I'm trying that locally but it hasn't crashed that way. Unfortunately I can't do the same in the environment it does crash (docker+ecs I can't modify)

dosyoyas commented 5 years ago

I haven't been able to reproduce this locally, so this can be closed.