Closed gitmachtl closed 2 years ago
Good point. Note that the last know tip also contains a UTC timestamp so, in principle, this is "enough" to know in it's starting to drift, albeit not practical.
It's also unfortunate that the network synchronization is only updated on every new tip, while simple, it means that the value is only refreshed when the connection is up. Perhaps having a background thread to create artificial ticks would be better here.
Would be possible to set "networkSynchronization": null,
if there is no socket connection to the node? This would also handle the start up condition if ogmios is started before the node, reporting a networkSynchronization of 0% in that case is not 100% correct. Reporting a null
would cover it, because "we don't know" the value at that state.
what about implementing it in the docker images of ogmios as healthcheck.sh script?
Currently neither curl nor jq are installed on the docker image.
@redoracle -> implementing what exactly in the docker image :thinking: ?
I meant implementing the healthcheck.sh script as usual docker images do in order to verify the container is running properly otherwise the healthcheck script will trigger the container restart.
by using this command : curl -s http://127.0.0.1:1337/health | jq I guess it is possible to verify some of the metrics to understand if the ogmios container is running properly.
Alternatively I can create one and map it inside the container, but at least I need preinstalled: curl and jq, in order to make it work.
attached here an example of a container with health-check and one without.
Seems like this can work nicely with just wget
as follows:
HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD \
[ connected == $(wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/') ]
Note: I've started re-working the docker images recently to avoid having to maintain two build systems. The new images are based on the Nix build and make heavy use of the caching:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# #
# ------------------------------- SETUP ------------------------------------- #
# #
FROM nixos/nix:2.3.11 as build
RUN echo "substituters = https://cache.nixos.org https://hydra.iohk.io" >> /etc/nix/nix.conf &&\
echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ=" >> /etc/nix/nix.conf
WORKDIR /app
RUN nix-shell -p git --command "git clone --depth 1 https://github.com/input-output-hk/cardano-configurations.git"
WORKDIR /app/ogmios
RUN nix-env -iA cachix -f https://cachix.org/api/v1/install && cachix use cardano-ogmios
COPY . .
RUN nix-build -A ogmios.components.exes.ogmios -o dist
RUN cp -r dist/* . && chmod +w dist/bin && chmod +x dist/bin/ogmios
# #
# --------------------------- BUILD (ogmios) --------------------------------- #
# #
FROM busybox as ogmios
ARG NETWORK=mainnet
LABEL name=ogmios
LABEL description="A JSON WebSocket bridge for cardano-node."
COPY --from=build /app/ogmios/bin/ogmios /bin/ogmios
COPY --from=build /app/cardano-configurations/network/${NETWORK} /config
EXPOSE 1337/tcp
STOPSIGNAL SIGINT
HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD \
[ connected == $(wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/') ]
ENTRYPOINT ["/bin/ogmios"]
# #
# --------------------- RUN (cardano-node & ogmios) -------------------------- #
# #
FROM inputoutput/cardano-node:1.31.0 as cardano-node-ogmios
ARG NETWORK=mainnet
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
LABEL name=cardano-node-ogmios
LABEL description="A JSON WebSocket bridge for cardano-node w/ a cardano-node."
COPY --from=build /app/ogmios/bin/ogmios /bin/ogmios
COPY --from=build /app/cardano-configurations/network/${NETWORK} /config
RUN mkdir -p /ipc
WORKDIR /root
COPY scripts/cardano-node-ogmios.sh cardano-node-ogmios.sh
# Ogmios, cardano-node, ekg, prometheus
EXPOSE 1337/tcp 3000/tcp 12788/tcp 12798/tcp
STOPSIGNAL SIGINT
HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD \
[ connected == $(wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/') ]
CMD ["bash", "cardano-node-ogmios.sh" ]
Still work-in-progress however as the cardano-node-ogmios
image isn't working properly (I need to overwrite the entrypoint of the image to the script doing the basic process monitoring.
wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/'
that
Seems like this can work nicely with just
wget
as follows:HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD \ [ connected == $(wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/') ]
that's nice too, but still wget is missing as preinstalled package. while sed is there.
# Ogmios, cardano-node, ekg, prometheus EXPOSE 1337/tcp 3000/tcp 12788/tcp 12798/tcp
Do you really need to expose all those ports if only used internally? normally the internal process will open those ports internally anyway, and if needed those can be mapped with "-p" to the public host interface.
BTW very good point migrating to nix, I like it very much.
wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/'
root@973ea926352e:/# wget http://localhost:1337 | sed 's/."connectionStatus":"([a-z]+)"./\1/' --2022-01-02 12:55:28-- http://localhost:1337/ Resolving localhost (localhost)... 127.0.0.1, ::1 Connecting to localhost (localhost)|127.0.0.1|:1337... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: 'index.html.20'
index.html.20 [ <=> ] 7.63K --.-KB/s in 0s
2022-01-02 12:55:28 (1.01 GB/s) - 'index.html.20' saved [7811]
not sure wget does the same of curl... or am I missing some other option?
the following returns the value that tell's us that ogmio is in sync, right?
root@973ea926352e:/# curl -s http://127.0.0.1:1337/health | jq .networkSynchronization
1
which I presume implies that is connected.
root@973ea926352e:/# curl -s http://127.0.0.1:1337/health | jq .connectionStatus
"connected"
that's nice too, but still wget is missing as preinstalled package. while sed is there.
Even on the new images with Nix, that is, on top of BusyBox? I thought wget was available in BusyBox ... :thinking:
Do you really need to expose all those ports if only used internally?
Those aren't internal though. except maybe 3000/tcp. ekg and prometheus are used for metrics, and ogmios is used for local clients.
not sure wget does the same of curl... or am I missing some other option?
Ah! My mistake... We need to hit the health endpoint here! So http://localhost:1337/health
!!
So
http://localhost:1337/health
!!
ok, but wget keeps saving the file not printing it, therefore I need an additional step to retrive the particular metric which says that the node is connected and in sync from the saved file. right?
So
http://localhost:1337/health
!!ok, but wget keeps saving the file not printing it, therefore I need an additional step to retrive the particular metric which says that the node is connected and in sync from the saved file. right?
what about this? wget -qO- http://localhost:1337/health | sed 's/.*\"connectionStatus\":\"//g' | sed 's/connected\"}/1/g'
for now I got it working with an healthchek.sh mapped inside the container as follow:
if ! command -v wget; then apt update && apt -y install wget; fi
result=$(wget -qO- http://localhost:1337/health | sed 's/.*\"connectionStatus\"\:\"//g' | sed 's/connected\"}/0/g')
if [ $result != 0 ]; then exit 1; fi
I guess with the NIX version it wouldn't work though :)
I figured that a nicer way to do all this would be to have a proper health-check command in Ogmios to begin with, so I implemented:
$ ogmios health-check --help
Handy command to check whether an Ogmios server is up-and-running, and correctly connected to a Network / cardano-node.
This can, for example, be wired to Docker's HEALTHCHECK feature easily.
Usage: ogmios health-check [--port TCP/PORT]
Performs a health check against a running server.
Available options:
-h,--help Show this help text
--port TCP/PORT Port to listen on. (default: 1337)
(see 62691fbbbd65fa9b0b5949819515674c9a8c3575)
It exits with 0 or 1, depending on whether it could perform a health check on a running server. Dead-simple to configure the HEALTHCHECK in the Dockerfile with that:
HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD /bin/ogmios health-check
That's very thoughtful and very nice!!
Well done! Tnx
Describe your idea, in simple words.
Running for example node 1.33.0 in P2P mode with
"DiffusionMode": "InitiatorOnly",
in the config will not create a local listening port anymore. So we can't use cardanoPing/cncli to check if the node is alive.
If such a node stops to work or was shutdown, there is currently no flag for that in the ogmios health check:
Thats a sample output after the node was shut down.
So using the health metrics, there is only one way currently to see if the node is really ok by comparing the
lastKnownTip
with the theoretical calculated one from the genesis files and do a threshold if it falls too far behind.The Error-Log is showing a warning like:
"networkSynchronization": 1,
also stays on1(=100%)
.Why is it a good idea?
It would be nice to have a flag that can show if the current connection to the node via the node socket is ok or not. We get error outputs in the logs, but not on the health check here.