AntelopeIO / leap

C++ implementation of the Antelope protocol
Other
116 stars 70 forks source link

`_binary_to_variant` function crashing on some blocks #2228

Closed cshintov closed 7 months ago

cshintov commented 7 months ago

I'm running 6 mainnet archive nodes, today morning onwards on all of them, a few blocks are returning 500, internal server error!

curl --location $url/v1/trace_api/get_block --header 'Content-Type: application/json' --data '{"block_num": "356339452" }' | jq .

{
  "code": 500,
  "message": "Internal Service Error",
  "error": {
    "code": 3015013,
    "name": "unpack_exception",
    "what": "Unpack data exception",
    "details": [
      {
        "message": "Stream unexpectedly ended; unable to unpack field 'to' of struct 'transfer'",
        "file": "abi_serializer.cpp",
        "line_number": 374,
        "method": "_binary_to_variant"
      }
    ]
  }
}

The nodes are running v4.0.5.

I tried rewinding the node back using

nodeos --genesis.json /root/local/genesis.json --data-dir /data --config-dir /root/local \
--hard-replay-blockchain --terminate-at-block 356339451

But afterwards it complained about database version mismatch.

warn  2024-02-08T11:54:53.441 nodeos    chain_plugin.cpp:1077         plugin_initialize    ] 3060005 bad_database_version_exception: Database is an unknown or unsupported version
state database version pre-dates versioning, please restore from a compatible snapshot or replay!
    {}
    nodeos  controller.cpp:687 validate_db_version

error 2024-02-08T11:54:53.472 nodeos    main.cpp:161                  main                 ] 3060005 bad_database_version_exception: Database is an unknown or unsupported version
state database version pre-dates versioning, please restore from a compatible snapshot or replay!
    {}
    nodeos  controller.cpp:687 validate_db_version
rethrow
    {}
    nodeos  chain_plugin.cpp:1077 plugin_initialize

So I tried syncing from a recent snapshot snapshot-2024-02-08-04-eos-v6-0356339018.bin

Even then this block was returning the same error.

There are a few more blocks like this in the next 10k range!

356339452
356339534
356340111
356340363
356340822

Is this a network wide issue, or only for me, how to recover?

Only thing left to do is to upgrade to v5.0.0 which I'm currently trying.

cshintov commented 7 months ago

The upgrade to v5.0.0 also didn't help. Something bad is lurking here: https://github.com/AntelopeIO/leap/blob/02da2839eae792cada7593a925fe3b9f112cf2fa/libraries/chain/abi_serializer.cpp#L374

heifner commented 7 months ago
curl --location http://eos.greymass.com/v1/trace_api/get_block --header 'Content-Type: application/json' --data '{"block_num": "356339452" }' | jq

Works.

heifner commented 7 months ago

What non-default options do you have on your trace_api_plugin node?

cshintov commented 7 months ago

config.ini

plugin = eosio::chain_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::net_plugin
plugin = eosio::http_plugin
plugin = eosio::state_history_plugin
plugin = eosio::trace_api_plugin
abi-serializer-max-time-ms = 50000
chain-state-db-size-mb = 1000000
enable-account-queries = true
http-server-address = 0.0.0.0:{{ env "NOMAD_PORT_rpc" }}
access-control-allow-origin = *
access-control-allow-headers = Origin, X-Requested-With, Content-Type, Accept
http-max-response-time-ms = 5000
verbose-http-errors = true
http-validate-host = false
p2p-listen-endpoint = 0.0.0.0:{{ env "NOMAD_PORT_wire" }}
p2p-server-address = {{ env "NOMAD_IP_wire" }}:{{ env "NOMAD_HOST_PORT_wire" }}
p2p-peer-address = peer.main.alohaeos.com:9876
p2p-peer-address = eos.edenia.cloud:9876
p2p-peer-address = p2p.eos.cryptolions.io:9876
p2p-peer-address = p2p.donates2eden.io:9876
p2p-peer-address = mainnet.eosamsterdam.net:9876
p2p-peer-address = p2p.eosflare.io:9876
p2p-peer-address = p2p.bitmars.one:8080
p2p-peer-address = 34.96.75.100:8099
p2p-peer-address = p2p.eos.detroitledger.tech:1337
p2p-peer-address = eos.seed.eosnation.io:9876
p2p-peer-address = peer1.eosphere.io:9876
p2p-peer-address = p2p.eossweden.org:9876
p2p-peer-address = eos.hashfin.com:9876
p2p-peer-address = eos.p2p.eosusa.io:9882
p2p-peer-address = eos.newdex.one:9876
p2p-max-nodes-per-host = 150
max-clients = 150
sync-fetch-span = 1000
trace-history = true
chain-state-history = true
state-history-endpoint = 0.0.0.0:{{ env "NOMAD_PORT_history" }}
trace-history-debug-mode = true
state-history-log-retain-blocks = 10713600
trace-rpc-abi = eosio=/root/local/eosio.abi
trace-rpc-abi = eosio.token=/app/reference-contracts/build/contracts/eosio.token/eosio.token.abi
trace-rpc-abi = eosio.msig=/app/reference-contracts/build/contracts/eosio.msig/eosio.msig.abi
trace-rpc-abi = eosio.wrap=/app/reference-contracts/build/contracts/eosio.wrap/eosio.wrap.abi

And running the chain as

command = "nodeos"
args    = [
  "--data-dir","/data",
  "--config-dir", "/root/local",
  "--disable-replay-opts",
  "--genesis-json", "/root/local/genesis.json",
]

normally, and with snapshot

        command = "nodeos"
        args    = [
          "--data-dir", "/data",
          "--config-dir" ,"/root/local",
          "--snapshot", "/data/snapshots/snapshot-2024-02-08-04-eos-v6-0356339018.bin",
        ]

Building the image with

Dockerfile

ARG version="5.0.0"
ARG cdt_version="4.0.1"

FROM ubuntu:20.04

ARG version
ARG cdt_version

ENV VERSION=${version}
LABEL VERSION=${version}

ADD scripts/ /opt/tatum.io

ENV USER_ID=3002
ENV GROUP_ID=3002

ARG DEBIAN_FRONTEND=noninteractive
ARG TZ=Etc/UTC

RUN apt-get update
RUN apt-get update --fix-missing
RUN apt-get install -y apt-utils
RUN apt-get install -y curl tzdata
RUN apt-get install -y zip unzip libncurses5 wget git build-essential cmake libboost-all-dev libcurl4-openssl-dev libgmp-dev libssl-dev libusb-1.0.0-dev libzstd-dev time pkg-config llvm-11-dev nginx npm yarn jq gdb lldb
RUN apt-get install -y gcc g++ make tar jq bash nano netcat-openbsd
RUN curl -fsSL https://deb.nodesource.com/setup_lts.x | bash -
RUN apt-get update
RUN apt-get install -y nodejs
RUN apt-get autoremove -y

WORKDIR /app
RUN npm install -g npm

RUN npm install -D webpack-cli
RUN npm install -D webpack
RUN npm install -D webpack-dev-server

COPY scripts/bootstrap_env.sh .
RUN ./bootstrap_env.sh ${version} ${cdt_version}

RUN if [ ${USER_ID:-0} -ne 0 ] && [ ${GROUP_ID:-0} -ne 0 ]; then \
    userdel -f www-data && \
    if getent group www-data ; then groupdel www-data; fi && \
    groupadd -g ${GROUP_ID} www-data && \
    useradd -l -u ${USER_ID} -g www-data www-data && \
    install -d -m 0755 -o www-data -g www-data /home/www-data && \
    chown --changes --silent --no-dereference --recursive \
          --from=33:33 ${USER_ID}:${GROUP_ID} \
          /home/www-data \
          /app \
   ;fi

RUN mkdir -p /home/www-data/nodes

# port for nodeos p2p
EXPOSE 9876
# port for nodeos http
EXPOSE 8888
# port for state history
EXPOSE 8080
# port for webapp
EXPOSE 8000

STOPSIGNAL SIGINT

bootstrap_env.sh

#! /bin/sh

set -e

ARCH=`uname -m`

ORG="AntelopeIO"
NODE_VERSION=${1:-"4.0.4"}
CDT_VERSION=${2:-"4.0.0"}

# Fetches the first layer of the container image from the GitHub Container Registry, and extracts the contents of the downloaded layer.
# Gives you the binaries of the container image without having to pull the entire image.
# You get leap-dev_<version>_<arch>.deb from here.
CONTAINER_PACKAGE=AntelopeIO/experimental-binaries
GH_ANON_BEARER=$(curl -s "https://ghcr.io/token?service=registry.docker.io&scope=repository:${CONTAINER_PACKAGE}:pull" | jq -r .token)
curl -s -L -H "Authorization: Bearer ${GH_ANON_BEARER}" https://ghcr.io/v2/${CONTAINER_PACKAGE}/blobs/$(curl -s -L -H "Authorization: Bearer ${GH_ANON_BEARER}" https://ghcr.io/v2/${CONTAINER_PACKAGE}/manifests/v${NODE_VERSION} | jq -r .layers[0].digest) | tar -xz

# Choose architecture
if [ "${ARCH}" = "x86_64" ]; then
   wget https://github.com/${ORG}/leap/releases/download/v${NODE_VERSION}/leap_${NODE_VERSION}_amd64.deb
   apt install -y ./leap_${NODE_VERSION}_amd64.deb
   apt install -y ./leap-dev_${NODE_VERSION}-ubuntu20.04_amd64.deb
   wget https://github.com/${ORG}/cdt/releases/download/v${CDT_VERSION}/cdt_${CDT_VERSION}_amd64.deb
   apt install -y ./cdt_${CDT_VERSION}_amd64.deb
else
   apt install -y ./leap-${NODE_VERSION}_arm64.deb
   apt install -y ./leap-dev_${NODE_VERSION}-ubuntu20.04_arm64.deb
   wget https://github.com/${ORG}/cdt/releases/download/v${CDT_VERSION}/cdt_${CDT_VERSION}_arm64.deb
   apt install -y ./cdt_${CDT_VERSION}_arm64.deb
fi

# Removing *.deb files that were pulled earlier to save space
rm *.deb

# Clone and build reference contracts
git clone https://github.com/${ORG}/reference-contracts
cd reference-contracts
mkdir build
cd build
cmake ..
make -j4

# replace org and other variables
cshintov commented 7 months ago

Hi @heifner, did you get a chance to look into this?

heifner commented 7 months ago

Hi @heifner, did you get a chance to look into this?

Sorry, no. Hopefully someone can take a look soon. I will say that we had no other reports of issues with trace_api_plugin. Potentially some kind of environment issue.

aaroncox commented 7 months ago

If it helps debug, the trace API running on eos.greymass.com only uses the following configuration for the trace_api:

plugin = eosio::trace_api_plugin
trace-dir = /mnt/history/traces
trace-no-abis = true

We don't use any other trace related configuration options.

We haven't seen any issues like this before - but we also don't decode server side, we handle decoding on the client that's making the request.

tedcahalleos commented 7 months ago

Hi @cshintov - if Aaron's response does not correct the issue, you can contact me via email at ted.cahall@eosnetwork.com and we can set up a call or chat to discuss other options.

tmeinlschmidt commented 7 months ago

hi all,

on behald of @cshintov

@aaroncox yes, you're not deserializing using abi as we do on server side, so no issues when we tried that option, we're getting raw output

@tedcahalleos I've tried to comment out eosio.token abi from the config file - then I'm able to get trace from the node (with trace-no-abis = false, so I assume there's some issue with certain blocks/transactions

using latest cdt 4.0.1

all these don't work with eosio.token abi enabled for the same error

357118176 357118965 357118255 357291244 357291896 357295793 357295889

thanks

tedcahalleos commented 7 months ago

@tmeinlschmidt Please email me at ted.cahall@eosnetwork.com so we can discuss offline. We are done with this ticket in terms of helping diagnose without speaking to you offline. Happy to set up a call or zoom once you contact me.