cloud-bulldozer / benchmark-operator

The Chuck Norris of cloud benchmarks
Apache License 2.0
285 stars 129 forks source link

Point redis image to multi-arch image on docker.io to support arm architecture #752

Closed svetsa-rh closed 2 years ago

svetsa-rh commented 2 years ago

Description

Point to redis multi-arch image to support arm architecture.

Fixes

mffiedler commented 2 years ago

This is the first of a series of PRs to make the benchmark-operator and e2e-benchmarking repos aarch64 compatible

mffiedler commented 2 years ago

Let us know if it preferable to copy this image to quay.io/cloud-bulldozer to avoid docker.io rate limiting

rsevilla87 commented 2 years ago

Hi @mffiedler, docker.io rate limiter always is a problem, but IIRC, the image bitnami/redis:latest is already in docker.io right?

rsevilla87 commented 2 years ago

FYI: I just copied the redis repository (with all the archs) to the cloud-bulldozer org in quay: https://quay.io/repository/cloud-bulldozer/redis?tab=tags

I think we can switch to it in this PR (quay.io/cloud-bulldozer/redis:latest)

mffiedler commented 2 years ago

@svetsa-rh Please update this PR to point benchmark-operator to the image uploaded by Raul.

rsevilla87 commented 2 years ago

Saw these redis related errors in the log


[pod/kube-burner-ca652ecd-lfffs/backpack] Traceback (most recent call last):
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "stockpile-wrapper.py", line 257, in <module>
[pod/kube-burner-ca652ecd-lfffs/backpack]     sys.exit(main())
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "stockpile-wrapper.py", line 224, in main
[pod/kube-burner-ca652ecd-lfffs/backpack]     run = _mark_node(r, my_node, my_uuid, es, check_val)
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "stockpile-wrapper.py", line 161, in _mark_node
[pod/kube-burner-ca652ecd-lfffs/backpack]     r.set(check_val, "Metadata-Running")
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/commands/core.py", line 2127, in set
[pod/kube-burner-ca652ecd-lfffs/backpack]     return self.execute_command("SET", *pieces, **options)
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1222, in execute_command
[pod/kube-burner-ca652ecd-lfffs/backpack]     lambda error: self._disconnect_raise(conn, error),
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/retry.py", line 45, in call_with_retry
[pod/kube-burner-ca652ecd-lfffs/backpack]     return do()
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1220, in <lambda>
[pod/kube-burner-ca652ecd-lfffs/backpack]     conn, command_name, *args, **options
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1195, in _send_command_parse_response
[pod/kube-burner-ca652ecd-lfffs/backpack]     return self.parse_response(conn, command_name, **options)
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1234, in parse_response
[pod/kube-burner-ca652ecd-lfffs/backpack]     response = connection.read_response()
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 836, in read_response
[pod/kube-burner-ca652ecd-lfffs/backpack]     raise response
[pod/kube-burner-ca652ecd-lfffs/backpack] redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
``
jtaleric commented 2 years ago

This is a duplicate PR, or is the other?

svetsa-rh commented 2 years ago

The other one is a duplicate. Sorry about that. Closed that one.

svetsa-rh commented 2 years ago

Saw these redis related errors in the log

[pod/kube-burner-ca652ecd-lfffs/backpack] Traceback (most recent call last):
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "stockpile-wrapper.py", line 257, in <module>
[pod/kube-burner-ca652ecd-lfffs/backpack]     sys.exit(main())
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "stockpile-wrapper.py", line 224, in main
[pod/kube-burner-ca652ecd-lfffs/backpack]     run = _mark_node(r, my_node, my_uuid, es, check_val)
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "stockpile-wrapper.py", line 161, in _mark_node
[pod/kube-burner-ca652ecd-lfffs/backpack]     r.set(check_val, "Metadata-Running")
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/commands/core.py", line 2127, in set
[pod/kube-burner-ca652ecd-lfffs/backpack]     return self.execute_command("SET", *pieces, **options)
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1222, in execute_command
[pod/kube-burner-ca652ecd-lfffs/backpack]     lambda error: self._disconnect_raise(conn, error),
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/retry.py", line 45, in call_with_retry
[pod/kube-burner-ca652ecd-lfffs/backpack]     return do()
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1220, in <lambda>
[pod/kube-burner-ca652ecd-lfffs/backpack]     conn, command_name, *args, **options
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1195, in _send_command_parse_response
[pod/kube-burner-ca652ecd-lfffs/backpack]     return self.parse_response(conn, command_name, **options)
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1234, in parse_response
[pod/kube-burner-ca652ecd-lfffs/backpack]     response = connection.read_response()
[pod/kube-burner-ca652ecd-lfffs/backpack]   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 836, in read_response
[pod/kube-burner-ca652ecd-lfffs/backpack]     raise response
[pod/kube-burner-ca652ecd-lfffs/backpack] redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
``

In our attempts to get scale-ci tests to run on ARM architecture, after our sanity tests passed and discussions between @mffiedler and I, I have opened this PR to update config manager yaml file to change from bitnami/redis:latest to docker.io/redis. This was decided based on our initial attempts/challenges of building a bitnami/redis arm image vs promising test results of redis arm image readily available at docker.io.

Based on Raul's note above, upon further testing I found that containers created using bitnami/redis have no trouble writing snapshot files to disk while containers created using docker.io has issues.

Output comparison:

Log output when using bitnami/redis: redis 17:02:55.01 INFO ==> Starting Redis 1:C 05 May 2022 17:02:55.020 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:C 05 May 2022 17:02:55.020 # Redis version=6.2.7, bits=64, commit=00000000, modified=0, pid=1, just started 1:C 05 May 2022 17:02:55.020 # Configuration loaded 1:M 05 May 2022 17:02:55.021 monotonic clock: POSIX clock_gettime 1:M 05 May 2022 17:02:55.021 # A key 'rediscompare_helper' was added to Lua globals which is not on the globals allow list nor listed on the deny list. 1:M 05 May 2022 17:02:55.021 Running mode=standalone, port=6379. 1:M 05 May 2022 17:02:55.021 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 05 May 2022 17:02:55.021 # Server initialized 1:M 05 May 2022 17:02:55.021 Ready to accept connections 1:signal-handler (1651770700) Received SIGTERM scheduling shutdown... 1:M 05 May 2022 17:11:40.355 # User requested shutdown... 1:M 05 May 2022 17:11:40.355 Calling fsync() on the AOF file. 1:M 05 May 2022 17:11:40.355 Saving the final RDB snapshot before exiting. 1:M 05 May 2022 17:11:40.355 DB saved on disk 1:M 05 May 2022 17:11:40.355 * Removing the pid file. 1:M 05 May 2022 17:11:40.355 # Redis is now ready to exit, bye bye... [root@db5be9b29a4e Development]#

Log output when using docker.io/redis (a.k.a quay.io/cloud-bulldozer/redis:latest): [root@c442abdbf09a Development]# oc logs -f benchmark-controller-manager-df9b6cb7f-r7lzh -c redis-master 1:C 05 May 2022 00:39:21.985 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:C 05 May 2022 00:39:21.985 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started 1:C 05 May 2022 00:39:21.985 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf 1:M 05 May 2022 00:39:21.986 monotonic clock: POSIX clock_gettime 1:M 05 May 2022 00:39:21.986 Running mode=standalone, port=6379. 1:M 05 May 2022 00:39:21.986 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 05 May 2022 00:39:21.986 # Server initialized 1:M 05 May 2022 00:39:21.987 Ready to accept connections 1:signal-handler (1651711584) Received SIGTERM scheduling shutdown... 1:M 05 May 2022 00:46:24.427 # User requested shutdown... 1:M 05 May 2022 00:46:24.427 Saving the final RDB snapshot before exiting. 1:M 05 May 2022 00:46:24.427 # Failed opening the RDB file dump.rdb (in server root dir /data) for saving: Permission denied 1:M 05 May 2022 00:46:24.427 # Error trying to save the DB, can't exit. 1:M 05 May 2022 00:46:24.427 # SIGTERM received but errors trying to shut down the server, check the logs for more information [root@c442abdbf09a Development]#

Having some outstanding issues trying to build bitnami/redis image for aarch64 from scratch. Working with @mffiedler on it.

svetsa-rh commented 2 years ago

Outstanding issue pending. Needs more work. Need to figure out and get docker.io/redis snapshotting working -OR- build a new image of bitnami/redis with ARM support.

More details here: https://github.com/cloud-bulldozer/benchmark-operator/pull/752#issuecomment-1121407466

mffiedler commented 2 years ago

/retest

svetsa-rh commented 2 years ago

@rsevilla87 Can we rerun the checks once again? We made a change to redis data dir for the image and want to see if that helped solve above issues.

rsevilla87 commented 2 years ago

@rsevilla87 Can we rerun the checks once again? We made a change to redis data dir for the image and want to see if that helped solve above issues.

Hey!, Seems like I can't rerun the workflow w/o any code change. You can make a commit --amend and push -f to force a new commit hash.

svetsa-rh commented 2 years ago

@rsevilla87

I pulled in upstream changes from github and that seems to trigger the test right away. However, I have a suspicion that the test will not pass as the complete changes needed to resolve snapshot issues are in master (https://github.com/cloud-bulldozer/benchmark-operator/compare/master...svetsa-rh:master), but not this branch (redis-multiarch-image) for which the PR was originally created.

In order to bypass snapshot errors,

config/manager/manager.yaml needs to be updated: From:

Also, charts/benchmark-operator/values.yaml refers to old redis and needs to be updated too: From: repository: bitnami/redis To: repository: quay.io/cloud-bulldozer/redis

svetsa-rh commented 2 years ago

@rsevilla87

I pulled in upstream changes from github and that seems to trigger the test right away. However, I have a suspicion that the test will not pass as the complete changes needed to resolve snapshot issues are in master (master...svetsa-rh:master), but not this branch (redis-multiarch-image) for which the PR was originally created.

In order to bypass snapshot errors,

config/manager/manager.yaml needs to be updated: From: - mountPath: /redis-master-data To: - mountPath: /data

Also, charts/benchmark-operator/values.yaml refers to old redis and needs to be updated too: From: repository: bitnami/redis To: repository: quay.io/cloud-bulldozer/redis

Updated redis mount dir. Tests re-triggered automatically.