elastic / rally

Macrobenchmarking framework for Elasticsearch
Apache License 2.0
1.91k stars 314 forks source link

Fixes #1595: Add method to process.py to return both return code and logs #1828

Closed favilo closed 4 months ago

favilo commented 5 months ago

As requested in https://github.com/elastic/rally/issues/1595

[x] Have you signed the contributor license agreement? [x] Have you followed the contributor guidelines? [x] Have you run make check-all successfully? [x] Did you choose a descriptive title and description for your PR?

favilo commented 5 months ago

LGTM, but #1595 lacks explicit clarification of what's exactly the win. I think we can find it here.

WDYT about using the modified run_subprocess() to avoid the double call in is_branch() as part of this PR? On quick scan that's probably the only improvement in the git module we can make as other methods use different commands in a sequence?

What do you think about the latest change?

gbanasiak commented 5 months ago

What do you think about the latest change?

Thank you for iterating. https://github.com/elastic/rally/pull/1828/commits/21caafa2f0ffd405ff93fc26d5fed706fd624b57 eliminates duplication but made me realize we're missing the logging now which I think should be kept for troubleshooting purposes. We could add snowflake logging at git module level, or add logging option to run_subprocess(). Or, rewrite run_subprocess_with_logging() to be based on subprocess.run() and return CompletedProcess, but this would extend the scope of the PR. Which option do you find the most elegant?

favilo commented 5 months ago

What do you think about the latest change?

Thank you for iterating. 21caafa eliminates duplication but made me realize we're missing the logging now which I think should be kept for troubleshooting purposes. We could add snowflake logging at git module level, or add logging option to run_subprocess(). Or, rewrite run_subprocess_with_logging() to be based on subprocess.run() and return CompletedProcess, but this would extend the scope of the PR. Which option do you find the most elegant?

Adding logging option to run_subprocess() feels like the least effort solution here. I wouldn't mind having run_subprocess_with_logging() return a Popen object which would also have .returncode and .stdout, but that has a lot more uses, and since we are already logging, do we need to keep track of the output separately?

I could certainly be convinced that the latter one is the correct approach, but maybe not in this small PR?

EDIT: But then again, the original usecase that brought about this change was needing to do exactly that, so I can run through and add a .returncode to all of those uses, and it will make things more obvious what is going on. Of course, adding type signatures here would also help in the long run.

gbanasiak commented 4 months ago

IT tests are failing as it turns out run_subprocess_with_logging() is also used in https://github.com/elastic/rally-teams, e.g. here. I wasn't aware of that. This changes the picture considerably. I think we should go for a new method in the process module for backward compatibility. The problem is https://github.com/elastic/rally-teams code can be interpreted by a range of Rally versions.

favilo commented 4 months ago

Huh, I see the IT tests failed, but looking at the logs, I don't actually see any failures noted.