Closed lystopad closed 1 month ago
We have run the same workflows against our self-hosted runners:
So, looks like the issue is not related to the number of RAM.
Also, I compared the configuration
Github Standard Runner
Kernel Version: 6.5.0-1025-azure
Operating System: Ubuntu ***.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.61GiB
Self-hosted runner
Kernel Version: 6.8.0-41-generic
Operating System: Ubuntu 24.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.755GiB
The difference is Kernel Version. Maybe the problem with github runner's ubuntu
Updates regarding Erigon issue with workflow canceling: changing ubuntu version did not help. Tested on: ubuntu-latest (22.04), ubuntu-24.04, ubuntu-20.04
Test passed on a self-hosted runner:
Kernel Version: 6.8.0-41-generic
Operating System: Ubuntu 24.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.755GiB
It seems to me that the issue is not related to performance or resources. I have tried running one of the failing workflows, namely, dashboard_erigon_withdrawals.yml
and it runs fine for me - https://github.com/somnathb1/hive/actions/runs/10728609938/job/29753487168
I have also tried running several instances of the hive tests in parallel on my local with low overall resource usage.
The issue was intermittently only appearing on some github runners. Issues related to the main
branch for hive failures aren't related to CI and has a separate issue. Closing for now.
System information
It happens with the latest master as well as with v2.60.4
OS & Version: Ubuntu 16GB RAM (Kernel Version: 6.5.0-1025-azure)
Commit hash: 68f41969f9165ae608a77b754b69116eec247b27
Erigon Command (with flags/config):
Consensus Layer:
Consensus Layer Command (with flags/config):
Chain/Network:
Quoting a message from the partners
Hi, Erigon team We have faced with a strange error on Github CI while run some erigon hive tests. After 30-60 minutes, the job fails with error code 143 (aborted). Example of such fail: xx-xx-xx-xx/job/29501828480
It is hard to debug the issue because in most cases we even cannot get github actions logs (. Currently we know that:
Google says that it may be related to CPU or RAM usage https://github.com/actions/runner-images/issues/6680 I assume RAM is ok (github ranner has 16 GB RAM), so maybe the problem relates to CPU.
If so, is there a way to decrease CPU usage inside docker container? I know it is possible to do on docker side, but it is not easy with hive, so maybe there is a way to do it on erigon side? Also, any insights about how to debug the issue will be grateful.
More details could be found in internal messanger in "erigon3" channel.