This is the docker container for the prover node. This container is responsible for running the prover node and handling tasks from the server.
If you had installed the prover docker before, please go to the Upgrading Prover Node section directly for upgrading.
The prover node requires a CUDA capable GPU, currently at minimum an RTX 4090.
The docker container is built on top of Nvidia's docker runtime and requires the Nvidia docker runtime to be installed on the host machine.
Install NVIDIA Drivers for Ubuntu
https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
You can check if you have drivers installed with nvidia-smi
Install Docker (From Nvidia, but feel free to install yourself!) https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#setting-up-docker
Install Docker Compose https://docs.docker.com/compose/install/linux/#install-the-plugin-manually
Install the Nvidia CUDA Toolkit + Nvidia docker runtime
We need to install the nvidia-container-toolkit on the host machine. This is a requirement for the docker container to be able to access the GPU.
Since the docs aren't the clearest, these are the commands to copy paste!
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
and then
sudo apt-get update
and then
sudo apt-get install -y nvidia-container-toolkit
Configure Docker daemon to use the nvidia
runtime as the default runtime.
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
Restart the docker daemon
sudo systemctl restart docker
(Ubuntu)
sudo service docker restart
(WSL Ubuntu)
Another method to set the runtime is to run this script after the cuda toolkit is installed. https://github.com/NVIDIA/nvidia-docker
sudo nvidia-ctk runtime configure
The image is currently built with
The versions should not be changed unless the prover node is updated. The compiled prover node binary is sensitive to the CUDA version and the Ubuntu version.
Better clean the old docker image/volumes if you want.
To Build the docker image, run the following command in the root directory of the repository.
bash build_image.sh
We do not use BuildKit as there are issues with the CUDA runtime and BuildKit.
prover_config.json
file is the config file for prover node service.
server_url
- The URL of the server to connect to for tasks. Currently the public test server's rpc is "https://rpc.zkwasmhub.com:8090".priv_key
- The private key of the prover node. This is used to sign the tasks which were done by the prover node. If you want to start multiple prover nodes, please use different priv key for each node as it will represent your node. Please note do not add "0x" at the begining of priv.The Dry Run service will be required to run parallel to the prover node. The Dry Run service is responsible for synchronising tasks with the server and ensuring the prover node is working correctly. This service must be run in parallel to the prover node, so running the service through docker compose is recommended.
dry_run_config.json
file is the config file for prover dry run service, modify the connection strings to the server and the MongoDB instance.
server_url
- The URL of the server to connect to for tasks. Ensure this is the same as the prover node. Currently the public test server's rpc is "https://rpc.zkwasmhub.com:8090".mongodb_uri
- The URI of the MongoDB instance to connect to. By default it is "mongodb://localhost:27017". You do not need change it if you start the prover node with docker compose up
and use default docker-compose.yml
.private_key
- Please fill the same priv_key as the prover config. Please note do not add "0x" at the begining of priv.It is required to set the hugepages on the host machine to the correct value. This is done by setting the vm.nr_hugepages
kernel parameter.
Use grep Huge /proc/meminfo
to check currently huge page settings. HugePages_Total must be more than 15000 to support one prover node.
For a machine running a single prover node, the value should be set to 15000. This is done with the following command.
sysctl -w vm.nr_hugepages=15000
Make sure you use grep Huge /proc/meminfo
to check it is changed and then start docker containers.
Please note the above will only set the current running system huge pages. It will be reset after the machine restarted. If you want to keep it after restarting, add the following entry to the /etc/sysctl.conf
file:
vm.nr_hugepages=15000
We support new continuation feature from this version. The minimum requirement of the available to run prover is 58 GB after with HugePages_Total 15000, which is about 88 GB.
If you need to specify GPUs, you can do so in the docker-compose.yml
file. The device_ids
field is where you can specify the GPU's to use.
The starting command for the container will use CUDA_VISIBLE_DEVICES=0
to specify the GPU to use.
You may also change the device_ids
field in the docker-compose.yml
file to specify the GPU's to use. Note that in the container the GPU indexing starts at 0.
Also ensure the command
field in docker-compose.yml
is modified for CUDA_VISIBLE_DEVICES
to match the GPU you would like to use.
MongoDB will work "out-of-the-box", however, if you need to do something specific, please refer the following section.
For most use cases, the default options should be sufficient.
The mongodb instance will run on port 27017
and the data will be stored in the ./mongo
directory.
Network mode is set to host
to allow the prover node to connect to the mongodb instance via localhost, however if you prefer the port mapping method, you can change the port in the docker-compose.yml
file.
If you are unsure about modifying or customizing changes, refer to the section below.
We require our Params FTP Server to be running before starting the prover node. The prover node must copy the parameters from the FTP server to it's own volume to operate correctly.
Start the FTP server with docker compose -f ftp-docker-compose.yml up
.
The default port is 21
and the default user is ftpuser
with password ftppassword
. The ports used for file transfer are 30000-30009
.
Make sure you had built the image via bash build_image.sh
Make sure you had reviewed the Prover Node Configuration part and changed the config files.
Once the Params FTP server is running, you can start the prover node.
Start all services at once with the command docker compose up
. However it may clog up the terminal window as they all run in the same terminal so you may run some services in detached mode. For example, use tmux
to run it.
docker compose up
will run the base services in order of mongodb, dry-run-service, prover-node service.
If you need to follow the logs/output of a specific container,
First navigate to the corresponding directory with the docker-compose.yml
file.
Then run docker logs -f <service-name>
Where service-name
is the name of the SERVICE named in t he docker compose file (mongodb, prover-node etc.)
If you need to check the static logs of the prover-dry-run-service
, then please navigate to the corresponding logs volume and view from there.
By default, you can run the following command to list the log files stored and then select one to view the contents.
sudo ls /var/lib/docker/volumes/prover-node-docker_dry-run-logs-volume/_data -lh
You can find the latest dry run log file and check the content by : sudo vim /var/lib/docker/volumes/prover-node-docker_dry-run-logs-volume/_data/[filename.log]
For prover service log, you can check: (default name configuration)
sudo ls /var/lib/docker/volumes/prover-node-docker_prover-logs-volume/_data -lh
sudo vim /var/lib/docker/volumes/prover-node-docker_prover-logs-volume/[filename.log]
Upgrading the prover node requires rebuilding the docker image with the new prover node binary, and clearing previously stored data.
Stop all containers with docker compose down
.
Manually stop the containers with docker container ls
and then docker stop <container-name-or-id>
.
Check docker container status by docker ps -a
.
Prune the containers with docker container prune
. Please note this will remove all docker containers, so if you have your own container not related to prover docker, need manually remove container.
Now as we introduce new continuation feature, the prover docker need 58 GB memory to run besides the 15000 huge pages. So totally the machine may need 88 GB memory minimum.
Pull the latest changes from the repository with git pull
.
You may need to stash changes if you have modified the docker-compose.yml
file and apply them again.
Similarly, if prover_config.json
or dry_run_config.json
have been modified, ensure the changes are applied again.
Find the correct volume you would like to delete with docker volume ls
.
Delete the prover-node workspace volume with docker volume rm <volume_name>
. By default volume_name is "prover-node-docker_workspace-volume". So by default do docker volume rm prover-node-docker_workspace-volume
.
Remove the old docker image with docker image ls
to check the image name and then docker image rm zkwasm:latest
Rebuild the docker image with bash build_image.sh
.
Then follow the Quick Start steps to start.
docker compose -f ftp-docker-compose.yml up
docker compose up
If you find the docker compose up
failed, please do docker volume rm prover-node-docker_workspace-volume
again and then try docker compose up
again.
If it still failed, please check the logs following Logs section
If prover running failed by "memory allocation of xxxx failed" but you had checked and confirmed the avaliable memory is large enough, you can stop the services by docker compose down
and do docker volume rm prover-node-docker_workspace-volume
and then start the services by docker compose up
to see whether it fix the issue or not.
If prover running failed by something related to "Cuda Error", which indicate the docker cannot find cuda or nvidia device, you can try to check /etc/docker/daemon.json
whether it is correctly set the nvidia runtime. It can be reset by:\
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
\
sudo systemctl restart docker
(Ubuntu)\
and see whether it fix the issue or not.
If prover running failed by some request "Timeout" reason, it maybe some network issue so just try to stop and start docker container again. docker compose down
and docker compose up