elastic / rally

Macrobenchmarking framework for Elasticsearch
Apache License 2.0
1.91k stars 314 forks source link

Install dependencies when listing tracks #1817

Open dpifke-elastic opened 6 months ago

dpifke-elastic commented 6 months ago

esrally list tracks attempts to load plugins for each track, and if a dependency is missing, Rally exits with an error.

Dependencies can be specified in track.json, however by default we don't install them every time we load a track, because doing so involves shelling out to pip.

This commit changes list tracks to install any necessary dependencies during track loading.

gbanasiak commented 5 months ago

I've noticed 3 issues here so far:

  1. For some reason eql/track.py loading fails which can be seen when calling esrally list tracks with this change.
% esrally list tracks

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Installing track dependencies [geneve==0.2.0, pyyaml]
[INFO] Installing track dependencies [https://github.com/elastic/package-assets/archive/main.tar.gz]
[ERROR] Cannot list. Could not load component [eql]
[..]

Logs with debug level enabled in ~/.rally/logging.json and the following extra line:

% git diff
diff --git a/esrally/utils/modules.py b/esrally/utils/modules.py
index 728b55d3..0d5782d2 100644
--- a/esrally/utils/modules.py
+++ b/esrally/utils/modules.py
@@ -102,6 +102,7 @@ class ComponentLoader:
         self.logger.debug("Adding [%s] to Python load path.", component_root_path)
         # needs to be at the beginning of the system path, otherwise import machinery tries to load application-internal modules
         sys.path.insert(0, component_root_path)
+        self.logger.debug("Resulting Python load path is %s.", sys.path)
         try:
             root_module = self._load_component(component_name, module_dirs)
             return root_module
2024-01-05 10:09:25,310 -not-actor-/PID:14313 esrally.utils.modules INFO Loading component [eql] from [/Users/grzegorz/.rally/benchmarks/tracks/default/eql]
2024-01-05 10:09:25,310 -not-actor-/PID:14313 esrally.utils.modules DEBUG Removing [__pycache__] from load path.
2024-01-05 10:09:25,311 -not-actor-/PID:14313 esrally.utils.modules DEBUG Adding [/Users/grzegorz/.rally/benchmarks/tracks/default] to Python load path.
2024-01-05 10:09:25,311 -not-actor-/PID:14313 esrally.utils.modules DEBUG Resulting Python load path is ['/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default/elastic', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default/elastic', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default/elastic', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/src/rally/.venv/bin', '/Users/grzegorz/.pyenv/versions/3.11.4/lib/python311.zip', '/Users/grzegorz/.pyenv/versions/3.11.4/lib/python3.11', '/Users/grzegorz/.pyenv/versions/3.11.4/lib/python3.11/lib-dynload', '/Users/grzegorz/src/rally/.venv/lib/python3.11/site-packages', '/Users/grzegorz/src/rally'].
2024-01-05 10:09:25,311 -not-actor-/PID:14313 esrally.utils.modules DEBUG Loading module [eql.track]
2024-01-05 10:09:25,611 -not-actor-/PID:14313 esrally.utils.modules ERROR Could not load component [eql]
Traceback (most recent call last):
  File "/Users/grzegorz/src/rally/esrally/utils/modules.py", line 107, in load
    root_module = self._load_component(component_name, module_dirs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grzegorz/src/rally/esrally/utils/modules.py", line 65, in _load_component
    m = importlib.import_module(p)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/grzegorz/.pyenv/versions/3.11.4/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'eql.track'

By comparison, a log without this change looks like this:

2024-01-05 10:11:00,852 -not-actor-/PID:14411 esrally.track.loader DEBUG Invoking plugin_reader with name [eql] resolved to path [/Users/grzegorz/.rally/benchmarks/tracks/default/eql]
2024-01-05 10:11:00,852 -not-actor-/PID:14411 esrally.utils.modules INFO Loading component [eql] from [/Users/grzegorz/.rally/benchmarks/tracks/default/eql]
2024-01-05 10:11:00,852 -not-actor-/PID:14411 esrally.utils.modules DEBUG Removing [__pycache__] from load path.
2024-01-05 10:11:00,852 -not-actor-/PID:14411 esrally.utils.modules DEBUG Adding [/Users/grzegorz/.rally/benchmarks/tracks/default] to Python load path.
2024-01-05 10:11:00,852 -not-actor-/PID:14411 esrally.utils.modules DEBUG Resulting Python load path is ['/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default/elastic', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default/elastic', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default/elastic', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/.rally/benchmarks/tracks/default', '/Users/grzegorz/.rally/libs', '/Users/grzegorz/src/rally/.venv/bin', '/Users/grzegorz/.pyenv/versions/3.11.4/lib/python311.zip', '/Users/grzegorz/.pyenv/versions/3.11.4/lib/python3.11', '/Users/grzegorz/.pyenv/versions/3.11.4/lib/python3.11/lib-dynload', '/Users/grzegorz/src/rally/.venv/lib/python3.11/site-packages', '/Users/grzegorz/src/rally'].
2024-01-05 10:11:00,853 -not-actor-/PID:14411 esrally.utils.modules DEBUG Loading module [eql.track]
2024-01-05 10:11:00,855 -not-actor-/PID:14411 esrally.track.loader INFO Reading track specification file [/Users/grzegorz/.rally/benchmarks/tracks/default/nested/track.json].

I don't understand why it's failing as sys.path is the same in both cases AFAICT. We can see that /Users/grzegorz/.rally/benchmarks/tracks/default gets added multiple times which doesn't look clean, but it's present in the path so eql.track import should work?

  1. The collection of tracks and challenges implemented in https://github.com/elastic/pytest-rally/blob/5bc8856f0532d38590e49fe2d15d8bf98a9f947f/pytest_rally/rally.py#L148 assumes the list starts at a specific line number which is no longer true once dependencies are loaded due to these 2 extra lines in the output:
[INFO] Installing track dependencies [geneve==0.2.0, pyyaml]
[INFO] Installing track dependencies [https://github.com/elastic/package-assets/archive/main.tar.gz]

Parsing starts too early which explains incorrect track and challenge in the failing tests:

----------------------------- live log collection ------------------------------
INFO     pytest_rally.rally:rally.py:110 Running command: [esrally list tracks --track-repository="/home/runner/.rally/benchmarks/tracks/rally-tracks-compat" --track-revision="master" --configuration-name="pytest"]
collected 99 items

test_all_tracks_and_challenges.py::TestTrackRepository::test_autogenerated[Name-Challenges] 
-------------------------------- live log setup --------------------------------
INFO     pytest_rally.elasticsearch:elasticsearch.py:84 Installing Elasticsearch: [esrally install --quiet --http-port=19200 --node=rally-node --master-nodes=rally-node --car=4gheap,trial-license,x-pack-ml,lean-watermarks --seed-hosts="127.0.0.1:19300" --revision=current]
INFO     pytest_rally.elasticsearch:elasticsearch.py:93 Starting Elasticsearch: [esrally start --runtime-jdk=bundled --installation-id=ea0c0824-9bda-4528-901b-ffefd[34](https://github.com/elastic/rally/actions/runs/7417792369/job/20184781412?pr=1817#step:8:35)5ab7f --race-id=a82199f8-ce25-4c89-8700-0b5f606a87d6]
-------------------------------- live log call ---------------------------------
INFO     pytest_rally.rally:rally.py:144 Running command: [esrally race --track="Name" --challenge="Challenges" --track-repository="/home/runner/.rally/benchmarks/tracks/rally-tracks-compat" --track-revision="master" --configuration-name="pytest" --enable-assertions --kill-running-processes --on-error="abort" --pipeline="benchmark-only" --target-hosts="127.0.0.1:19200" --test-mode]
FAILED                                                                   [  1%]
test_all_tracks_and_challenges.py::TestTrackRepository::test_autogenerated[-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 
-------------------------------- live log call ---------------------------------
INFO     pytest_rally.rally:rally.py:144 Running command: [esrally race --track="-----------------------" --challenge="-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------" --track-repository="/home/runner/.rally/benchmarks/tracks/rally-tracks-compat" --track-revision="master" --configuration-name="pytest" --enable-assertions --kill-running-processes --on-error="abort" --pipeline="benchmark-only" --target-hosts="127.0.0.1:19200" --test-mode]
FAILED                                                                   [  2%]

It's interesting that in the nox session Rally uses in tests, esrally list tracks --track-repository="/home/runner/.rally/benchmarks/tracks/rally-tracks-compat" --track-revision="master" --configuration-name="pytest" command runs fine, i.e. doesn't run into problem number 1.

  1. Noticed the following error in ~/.rally/logs/dependency.log which is not fatal it seems but shows we have collisions between packages installed by Rally, and packages imported from track dependencies:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
botocore 1.21.65 requires urllib3<1.27,>=1.25.4, but you have urllib3 2.1.0 which is incompatible.
esrally 2.10.0 requires elastic-transport==8.4.1, but you have elastic-transport 8.11.0 which is incompatible.
esrally 2.10.0 requires elasticsearch[async]==8.6.1, but you have elasticsearch 8.11.1 which is incompatible.
esrally 2.10.0 requires urllib3==1.26.18, but you have urllib3 2.1.0 which is incompatible.