0xPolygon / kurtosis-cdk

A Kurtosis package that deploys a private, portable, and modular Polygon CDK devnet
https://docs.polygon.technology/cdk
Apache License 2.0
28 stars 43 forks source link

Kurtosis CDK Deployment failure #136

Closed masisyepremyan closed 1 month ago

masisyepremyan commented 1 month ago

System information

Ubuntu 22.04

Commit id

kurtosis run --enclave cdk-v1 --args-file params.yml --image-download always .

Tools versions

Checking that you have the necessary tools to deploy the Kurtosis CDK package... ✅ kurtosis 0.89.12 is installed, meets the requirement (=0.89). ✅ docker 26.1.3 is installed, meets the requirement (>=24.7).

You might as well need the following tools to interact with the environment... ✅ jq 1.6 is installed. ✅ yq 3.4.3 is installed, meets the requirement (>=3.2). ✅ cast 0.2.0 is installed. ✅ polycli v0.1.43 is installed.

🎉 You are ready to go!

Description & steps to reproduce

After many times of cleaning and starting the deployment process again, I'm getting incomplete deployment with errors in output.

Adding service with name 'zkevm-prover-001' and image 'hermeznetwork/zkevm-prover:v6.0.0' There was an error executing Starlark code An error occurred executing instruction (number 62) at github.com/0xPolygon/kurtosis-cdk/lib/zkevm_prover.star[22:28]: add_service(name="zkevm-prover-001", config=ServiceConfig(image="hermeznetwork/zkevm-prover:v6.0.0", ports={"executor-server": PortSpec(number=50071, application_protocol="grpc"), "hash-db-server": PortSpec(number=50061, application_protocol="grpc")}, files={"/etc/zkevm": "prover-config-artifact"}, entrypoint=["/bin/bash", "-c"], cmd=["[[ \"{{kurtosis:0cae89edc8d14d058d76602193534819:output.runtime_value}}\" == \"aarch64\" || \"{{kurtosis:0cae89edc8d14d058d76602193534819:output.runtime_value}}\" == \"arm64\" ]] && export EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1; /usr/local/bin/zkProver -c /etc/zkevm/prover-config.json"])) Caused by: Unexpected error occurred starting service 'zkevm-prover-001' Caused by: An error occurred waiting for all TCP and UDP ports to be open for service 'zkevm-prover-001' with private IP '172.16.0.26'; this is usually due to a misconfiguration in the service itself, so here are the logs: == SERVICE 'zkevm-prover-001' LOGS ===================================

== FINISHED SERVICE 'zkevm-prover-001' LOGS =================================== Caused by: An error occurred while waiting for all TCP and UDP ports to be open Caused by: Unsuccessful ports check for IP '172.16.0.26' and port spec '{privatePortSpec:0xc00073c3c0}', even after '240' retries with '500' milliseconds in between retries. Timeout '2m0s' has been reached Caused by: An error occurred while calling network address '172.16.0.26:50071' with port protocol 'TCP' and using time out '200ms' Caused by: dial tcp 172.16.0.26:50071: i/o timeout

Error encountered running Starlark code.

Desired behavior

I am expecting successful deployment of all necessary services.

What is the severity of this bug?

Critical; I am blocked and Kurtosis CDK is unusable for me because of this bug.

leovct commented 1 month ago

Hello @masisyepremyan, it is hard to tell what is going wrong here.

First, I would advise to clean the whole environment.

kurtosis clean --all

Then, to make sure you pulled the latest versions of the repository.

git checkout main
git pull

And finally, to deploy the stack.

kurtosis run --enclave cdk-v1 --args-file params.yml --image-download always .

If you run into more issues, I recommend you to join the polygon-cdk discord channel here: https://discord.gg/rkUQZTTB

masisyepremyan commented 1 month ago

Thanks dear @leovct for answering. Just have tried every step you have mentioned, but again, I'm getting the same result

`Adding service with name 'zkevm-prover-001' and image 'hermeznetwork/zkevm-prover:v6.0.0' There was an error executing Starlark code An error occurred executing instruction (number 53) at github.com/0xPolygon/kurtosis-cdk/lib/zkevm_prover.star[22:28]: add_service(name="zkevm-prover-001", config=ServiceConfig(image="hermeznetwork/zkevm-prover:v6.0.0", ports={"executor-server": PortSpec(number=50071, application_protocol="grpc"), "hash-db-server": PortSpec(number=50061, application_protocol="grpc")}, files={"/etc/zkevm": "prover-config-artifact"}, entrypoint=["/bin/bash", "-c"], cmd=["[[ \"{{kurtosis:f4f1bd09eb6d424380b4938c6e95c2d1:output.runtime_value}}\" == \"aarch64\" || \"{{kurtosis:f4f1bd09eb6d424380b4938c6e95c2d1:output.runtime_value}}\" == \"arm64\" ]] && export EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1; /usr/local/bin/zkProver -c /etc/zkevm/prover-config.json"])) Caused by: Unexpected error occurred starting service 'zkevm-prover-001' Caused by: An error occurred waiting for all TCP and UDP ports to be open for service 'zkevm-prover-001' with private IP '172.16.0.17'; this is usually due to a misconfiguration in the service itself, so here are the logs: == SERVICE 'zkevm-prover-001' LOGS ===================================

== FINISHED SERVICE 'zkevm-prover-001' LOGS =================================== Caused by: An error occurred while waiting for all TCP and UDP ports to be open Caused by: Unsuccessful ports check for IP '172.16.0.17' and port spec '{privatePortSpec:0xc001eb7980}', even after '240' retries with '500' milliseconds in between retries. Timeout '2m0s' has been reached Caused by: An error occurred while calling network address '172.16.0.17:50061' with port protocol 'TCP' and using time out '200ms' Caused by: dial tcp 172.16.0.17:50061: i/o timeout

Error encountered running Starlark code.`

leovct commented 1 month ago

Thanks dear @leovct for answering. Just have tried every step you have mentioned, but again, I'm getting the same result

`Adding service with name 'zkevm-prover-001' and image 'hermeznetwork/zkevm-prover:v6.0.0' There was an error executing Starlark code An error occurred executing instruction (number 53) at github.com/0xPolygon/kurtosis-cdk/lib/zkevm_prover.star[22:28]: add_service(name="zkevm-prover-001", config=ServiceConfig(image="hermeznetwork/zkevm-prover:v6.0.0", ports={"executor-server": PortSpec(number=50071, application_protocol="grpc"), "hash-db-server": PortSpec(number=50061, application_protocol="grpc")}, files={"/etc/zkevm": "prover-config-artifact"}, entrypoint=["/bin/bash", "-c"], cmd=["[[ "{{kurtosis:f4f1bd09eb6d424380b4938c6e95c2d1:output.runtime_value}}" == "aarch64" || "{{kurtosis:f4f1bd09eb6d424380b4938c6e95c2d1:output.runtime_value}}" == "arm64" ]] && export EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1; /usr/local/bin/zkProver -c /etc/zkevm/prover-config.json"])) Caused by: Unexpected error occurred starting service 'zkevm-prover-001' Caused by: An error occurred waiting for all TCP and UDP ports to be open for service 'zkevm-prover-001' with private IP '172.16.0.17'; this is usually due to a misconfiguration in the service itself, so here are the logs: == SERVICE 'zkevm-prover-001' LOGS ===================================

== FINISHED SERVICE 'zkevm-prover-001' LOGS =================================== Caused by: An error occurred while waiting for all TCP and UDP ports to be open Caused by: Unsuccessful ports check for IP '172.16.0.17' and port spec '{privatePortSpec:0xc001eb7980}', even after '240' retries with '500' milliseconds in between retries. Timeout '2m0s' has been reached Caused by: An error occurred while calling network address '172.16.0.17:50061' with port protocol 'TCP' and using time out '200ms' Caused by: dial tcp 172.16.0.17:50061: i/o timeout

Error encountered running Starlark code.`

Alright, join the discord server and I'll be happy to help you debug.

leovct commented 1 month ago

I'm going to close this issue as we were unable to reproduce the zkevm-prover issue. Don't hesitate to continue the discussion on Discord, we'll try to help you.