cloud-bulldozer / benchmark-wrapper

Python Library to run benchmarks
https://benchmark-wrapper.readthedocs.io
Apache License 2.0
19 stars 56 forks source link

backlog of ES results results in failure to get yielded results to ES? #337

Open bengland2 opened 3 years ago

bengland2 commented 3 years ago

During a long-running test with lots of data to push to ES, I got into a situation where smallfile wrapper yielded several test results but the test results never made it to ES. This is bad because we lose valuable information about what went wrong and valuable partial results. For example: for uuid ca33d6d7-7cf1-5b08-91f6-95ca34bbdac8 in dev ES server, I saw that several results were missing ffor rename operation in sample 1 when I ran ```

python3 analyze-smf-test-results.py ca33d6d7-7cf1-5b08-91f6-95ca34bbdac8

However, when I looked in the pod log file for pod 10 here , I see that the rename test completed successfully and an ES document was indeed generated for it (yield). However, later in sample 1 of that test, the cleanup operation raised an exception (redis timeout) and this aborted the pod, I suspect before the documents in flight could reach ES. Specifically I never saw this log message:

        logger.info(
            "Indexed results - %s success, %s duplicates, %s failures, with %s retries."
            % (res_suc, res_dup, res_fail, res_retry)
        )

In all other cases where the test finishes, it finishes with 0 duplicates, 0 failures and 0 retries and thousands of results.

I'd like to have some sort of mechanism to checkpoint the ES documents if there is an exception so that any in-flight documents get to ES before the test proceeds to the next operation, do people agree with this? What's the most economical way to get this behavior? Can we catch the exception somehow before the pod exits and get it to complete sending in-flight documents to ES?