Currently, the e2e tests can be quite flakey. Some flakes are due to the e2e test logic itself, others might be due to more knuu related things.
There were a bunch of txsim related ones, but I think those have been fixed so I'm not including them here for the time being. I'm also not including any from MajorUpgradeToV3 as that is a known issue in https://github.com/celestiaorg/celestia-app/issues/4023
Observed Flakes:
1)
ERROR E2ESimple
test-e2e2024/10/08 17:03:33 --- ERROR E2ESimple: expected at least 10 transactions, got 0
2)
2024/10/08 14:48:10 failed to wait for height: post failed: Post "[http://151.115.12.124:80/val3-26657](http://151.115.12.124/val3-26657)": context deadline exceeded
exit status 1
3)
RUN MinorVersionCompatibility
2024/10/02 14:51:35 failed to upgrade node: error waiting for instance 'val3' to be running: error checking if instance 'val3' is running: failed to get pod val3: replicasets.apps "val3" not found
exit status 1
4)
MinorVersionCompatibility
2024/10/08 12:45:45 Failed to start testnet: failed to start node val0: error getting status
5)
this failure is likely due to not skipping v1.8.0, which was retracted iirc, not sure if that fixes the others as well
{"level":"debug","RPC Address":"[http://151.115.12.124:80/val2-26657](http://151.115.12.124/val2-26657)","time":"2024-10-08T14:46:26Z","message":"Creating HTTP client for node"}
test-e2e2024/10/08 14:46:26 Upgrading node node 3 version v1.5.0
{"level":"debug","RPC Address":"[http://151.115.12.124:80/val3-26657](http://151.115.12.124/val3-26657)","time":"2024-10-08T14:46:50Z","message":"Creating HTTP client for node"}
test-e2e2024/10/08 14:46:51 Upgrading node node 4 version v1.8.0
2024/10/08 14:48:10 failed to wait for height: post failed: Post "http://151.115.12.124:80/val3-[266](https://github.com/celestiaorg/celestia-app/actions/runs/11237776588/job/31240969532#step:5:267)57": context deadline exceeded
exit status 1
make: *** [Makefile:147: test-e2e] Error 1
6)
MinorVersionCompatibility
2024/10/08 14:48:10 failed to wait for height: post failed: Post "[http://151.115.12.124:80/val3-26657](http://151.115.12.124/val3-26657)": context deadline exceeded
exit status 1
7)
no tests where able to be started
time="2024-10-02T14:38:45Z" level=info msg="Pod statuses" file="k8s/pod_status.go:100" pod_statuses="Pending: 4 , Running: 6 "
time="2024-10-02T14:39:45Z" level=warning msg="Pods pending for too long" file="k8s/pod_status.go:99" pending_pods="knuu-preloader-5b27e367-mgl9f, knuu-preloader-5b27e367-sc8ht, knuu-preloader-5b27e367-vvgwk, val3-6664ee73-cwqkt"
time="2024-10-02T14:39:45Z" level=info msg="Pod statuses" file="k8s/pod_status.go:100" pod_statuses="Running: 6 , Pending: 4 "
2024/10/02 14:39:55 failed to upgrade node: error waiting for instance to be running: error checking if instance 'val3-6664ee73' is running: failed to get pod val3-6664ee73: replicasets.apps "val3-6664ee73" not found
exit status 1
8)
New state sync test
2024/11/03 18:22:12 failed to get header: error in json rpc client, with http response metadata: (Status: 503 Service Unavailable, Protocol HTTP/1.1). error unmarshalling: invalid character 'o' in literal null (expecting 'u')
exit status 1
Currently, the e2e tests can be quite flakey. Some flakes are due to the e2e test logic itself, others might be due to more knuu related things.
There were a bunch of txsim related ones, but I think those have been fixed so I'm not including them here for the time being. I'm also not including any from
MajorUpgradeToV3
as that is a known issue in https://github.com/celestiaorg/celestia-app/issues/4023Observed Flakes:
1)
2)
3)
4)
5) this failure is likely due to not skipping v1.8.0, which was retracted iirc, not sure if that fixes the others as well
6)
7)
8)