elastic / rally

Macrobenchmarking framework for Elasticsearch
Apache License 2.0
1.95k stars 313 forks source link

Pin transitive dep 'aiohttp==3.8.6' #1806

Closed b-deam closed 10 months ago

b-deam commented 10 months ago

Debugging https://github.com/elastic/rally/pull/1804 confirmed we're hitting this bug https://github.com/aio-libs/aiohttp/issues/7864.

aiohttp is a transitive dependency of esrally and 3.9.0 was released on the 18th Nov, which is around the first time we noticed this bug that was caught by CI on the 19th.

Pinning this at 3.8.6 until a fix is merged upstream.

Reproduction

You too can reproduce this bug with an Elasticsearch cluster and this script:

from elasticsearch import AsyncElasticsearch
import elasticsearch
import asyncio
import ssl
import certifi
import warnings

warnings.filterwarnings("ignore", category=DeprecationWarning)

ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=certifi.where())
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE

es = AsyncElasticsearch(
    hosts=["https://elasticsearch:9200"],
    ssl_context=ssl_context,
    verify_certs=False,
    basic_auth=("elastic", "changeme"),
)

async def tasks():
    complete = False
    try:
        await es.indices.forcemerge(request_timeout=0.001)
        complete = True
    except elasticsearch.ConnectionTimeout:
        print("Timing out")
        pass
    while not complete:
        tasks = await es.tasks.list(params={"actions": "indices:admin/forcemerge"})
        if len(tasks["nodes"]) == 0:
            # empty nodes response indicates no tasks
            complete = True

async def create_datastreams():
    await es.cluster.put_settings(persistent={"cluster.max_shards_per_node": 5000})
    # Create an index lifecycle policy
    await es.ilm.put_lifecycle(name="repro", policy={"phases": {}})
    # Create component templates
    compontent_template = {"settings": {"index.lifecycle.name": "repro", "index.number_of_shards": 250}}
    await es.cluster.put_component_template(name="repro", template=compontent_template)
    # Create an index template
    await es.indices.put_index_template(name="repro", composed_of=["repro"], index_patterns="logs*", data_stream={})
    # Create the data stream
    for i in range(1, 11):
        await es.indices.create_data_stream(name=f"logs-{i}", ignore=400)

async def main():
    await create_datastreams()
    while True:
        await tasks()

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Test with each version of aiohttp:

$ pip install "aiohttp==3.9.0"
$ python reproduce_bug.py                          
Timing out
Traceback (most recent call last):
  File "resp_bug.py", line 60, in <module>
    loop.run_until_complete(main())
  File "/Users/bradleydeam/.pyenv/versions/3.8.16/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "resp_bug.py", line 55, in main
    await tasks()
  File "resp_bug.py", line 33, in tasks
    if len(tasks["nodes"]) == 0:
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/elastic_transport/_response.py", line 188, in __getitem__
    return self.body[item]  # type: ignore[index]
KeyError: 'nodes'
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x1068be340>
Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x1068c2220>, 11.885923)]']
connector: <aiohttp.connector.TCPConnector object at 0x1068be400>

$ pip install "aiohttp==3.8.6"
# infinite loop
$ python resp_bug.py          
Timing out
Timing out
Timing out
Timing out
Timing out
^CTraceback (most recent call last):
b-deam commented 10 months ago

I actually wonder if this warrants a new release given that users installing rally from PyPi are going to (likely) get aiohttp 3.9.0?