abracadaniel / cardano-pool-docker

Docker container for setting up and running a Cardano Stake Pool
163 stars 62 forks source link

Cronjob running topology_submit looks in the wrong place #19

Closed robinboening closed 3 years ago

robinboening commented 3 years ago

Currently, the topology_submit script lives in /scripts/functions/topology_submit. Though, crontab is configured to execute the file on this path /scripts/topology_submit.

I'd move the topology_submit script one level higher into scripts/.

I'll send a PR in a few minutes.

robinboening commented 3 years ago

Now that I am thinking about it a bit more, I think there is more that should be changed.

The cronjob only runs topology_submit, but we never run the topology_update again. I think running the topology_update via cron is important in order to continuously receive an up to date list of peers.

Is there a reason you keep the submit and update in separate scripts? The https://raw.githubusercontent.com/cardano-community/guild-operators/master/scripts/cnode-helper-scripts/topologyUpdater.sh script combines the submit and update part in this one file and I see no issue with that. Shall we do that as well?

I could come up with a PR and we can discuss there.

abracadaniel commented 3 years ago

The reason why I have not put the topology_update in a cron job is because it has to restart the node for the updated topology to take effect, and those automatic restarts might collide with the production of a block, which would be unlucky, but possible. It most likely will not be a big issue for a small pool, but big pools can produce multiple blocks per day, then there is a pretty high risk of a restart colliding with a block. I'm not really sure what a good solution for this would be, as ofc its most optimal to consistently have an updated topology. A solution could be that the scheduled updates use data from the leaderlogs to try to find a timeslot where there are no blocks for the pool.

Currently the topology is only updated when the docker container is restarted, which happens for example on node updates, maintenance, etc. I have not experienced that the pool runs out of connected peers yet, as most pools are pretty stable.

robinboening commented 3 years ago

I am really just in the very beginning of understanding all those things, but I would have thought a graceful restart of the cardano-node in the relay node (killall -9 cardano-node) would not interfere with the block producing node.

Are you sure this is the case?

abracadaniel commented 3 years ago

The relay does not directly interfere with the producing node, it just has to be up for the block producer to synchronize with the network, so in case the relay is restarting at the same time when the pool gets elected to mint a block, it might not receive the block verification from the block producer to submit it to the chain.

I am not 100% sure this is what will happen, I'll try to look into it. :)

robinboening commented 3 years ago

Okay, got it.

Are you using the same conrjob in your pool? and is it really working for you?

One issue is the wrong path to the script (fixed in the PR) but another thing I just noticed is that the script requires some env vars and since cronjobs are not executed inside the same interactive shell the user would use it doesn't have access to the env vars.

My first thought was to just source init_node_vars in the submit script, or source .bashrc as it sources init_node_vars too, when the cronjob runs, but then I realised init_node_vars also depends on env vars and those are defined in the Dockerfile: CARDANO_NETWORK and NODE_NAME

Not sure what is the best way to fix this. Do you have an idea?