EOSIO / eos

An open source smart contract platform
https://developers.eos.io/manuals/eos
MIT License
11.27k stars 3.6k forks source link

Exceeding CPU limit when pushing eosio.system contract #8289

Closed krotkiewicz closed 4 years ago

krotkiewicz commented 4 years ago

Hey, when I run custom testnet nodeos with custom genesis file, I randomly get below error when pushing eosio.system contract.

$ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2019-12-07T13:03:09.462 nodeos    http_plugin.cpp:652           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080004: Transaction exceeded the current CPU usage limit imposed on the transaction

I use eos 1.8.5 from eostudio/eos:v1.8.5 docker image and eosio.contracts from release/1.8.x branch plus following genesis and config files:

{
  "initial_timestamp": "2018-03-02T12:00:00.000",
  "initial_key": "EOS55oVd4ZyC2QMiAMzs1WFYzEYU1GKAAALPCa45Dzs9wrUWo3hjD",
  "initial_configuration": {
    "max_block_net_usage": 1048576,
    "target_block_net_usage_pct": 1000,
    "max_transaction_net_usage": 524288,
    "base_per_transaction_net_usage": 12,
    "net_usage_leeway": 500,
    "context_free_discount_net_usage_num": 20,
    "context_free_discount_net_usage_den": 100,
    "max_block_cpu_usage": 100000,
    "target_block_cpu_usage_pct": 500,
    "max_transaction_cpu_usage": 50000,
    "min_transaction_cpu_usage": 100,
    "max_transaction_lifetime": 3600,
    "deferred_trx_expiration_window": 600,
    "max_transaction_delay": 3888000,
    "max_inline_action_size": 4096,
    "max_inline_action_depth": 4,
    "max_authority_depth": 6
  }
}
chain-state-db-size-mb = 65536
reversible-blocks-db-size-mb = 65536

producer-name = eosio
http-server-address = 0.0.0.0:8888

plugin = eosio::chain_api_plugin
plugin = eosio::history_plugin
plugin = eosio::history_api_plugin
plugin = eosio::http_plugin
plugin = eosio::producer_plugin
plugin = eosio::producer_api_plugin

access-control-allow-origin = *
http-validate-host = false
enable-stale-production = true
max-transaction-time = 3000000

filter-on = *

To easily reproduce it I prepared a repo with Dockerfile: https://github.com/ulamlabs/eos-regtest. Just build the image and run it couple of times.

What I tried?

Increasing max CPU time in genesis file - then the pushing the system contract takes seconds and errors with Please increase the expiration time of your transaction!.

Thanks for helping.

jgiszczak commented 4 years ago

The max-transaction-time of 3000000 ms you have specified exceeds the maximum block time of 500 ms. No transaction taking that long can be incorporated into a block. You could try increasing max_transaction_cpu_usage in genesis.json from 50000 microseconds to something less than 500000 microseconds.

krotkiewicz commented 4 years ago

@jgiszczak I already tried that. When I increase all max cpus in the genesis file to some absurd values (like minutes), the transaction takes forever to execute and the node gives up with timeout error.

So there is something bad going on when executing that transaction (put system contract), looks like a deadlock to me.

jgiszczak commented 4 years ago

As I said, increasing those limits to beyond the block time is useless. Any transaction taking longer than block time will fail with the timeout you are seeing, always. The transaction is expiring in the queue after repeated failures to be incorporated into a block.

Try building the 1.8.3 release of the system contract rather than the 1.8.x branch. What version of CDT did you use to build the system contract?

jgiszczak commented 4 years ago

Testing demonstrates that changing max_transaction_cpu_usage in genesis.json still works as advertised. Note that EOS Mainnet uses a value of 150000 microseconds. It's safe to say 50000 is too low for current common contracts.

ksiazkowicz commented 4 years ago

I can confirm that changing those values works, but the issue that we're facing is that no matter if we set these values to something unreasonable like 20 seconds per block or we use mainnet values (0.2s for max block CPU usage and 0.15s for max transaction CPU usage), it doesn't guarantee us that this transaction won't exceed the deadline. We counteract this by repeatedly submitting it until it succeeds and ultimately it does, usually requiring only about ~0.01-0.03s of CPU time. There must be some reason why this transaction either exceeds 0.15s deadline or gets executed under just a fraction of that time.

We use eosio.cdt 1.6.3 as advised in release notes. Using v1.8.3 of system contract doesn't seem to resolve the issue.

jgiszczak commented 4 years ago

Reopening. I'm getting three independent reports of consistent transaction timing failures in the 1.8 branch, ranging all the way from 1.8.0 all the way up to 1.8.6.

netuoso commented 4 years ago

I am experiencing this same issue when deploying eosio.system contract.

EOS Version: 1.8.9 CDT Version: 1.6.3 Contracts Version: 1.83

Not sure exactly when it appeared but it is preventing local testnet deployment.

netuoso commented 4 years ago

I was able to solve this problem by adding http-max-response-time-ms = 500 to the nodeos config.ini. This option was added with EOS version 1.8.9+

The default value is 30ms which may not be enough on a local, single node testnet

heifner commented 4 years ago

Note tx_cpu_usage_exceeded "Transaction exceeded the current CPU usage limit imposed on the transaction" is different than timeout_exception "deadline ${d} exceeded by ${t}us". timeout_exception is tied to http-max-response-time-ms.

netuoso commented 4 years ago

Correct, the original poster said.

Increasing max CPU time in genesis file - then the pushing the system contract takes seconds and errors with Please increase the expiration time of your transaction!.

So when he fixes the tx_cpu_usage_exceeded bug, he gets a new bug. The new issue will be an expired transaction. I was experiencing the same thing and it was specifically related to deployed the eosio.system contract after a version update.

The newer versions of EOSIO defaulted the http-max-response-time to 30 to prevent RPC node abuse. As you have to use the RPC node to set the contract, the imposed limit caused issues on some hardware in a local testnet. Please at least attempt the above setting in a testnet before declaring it does not fix the described issue.

Hopefully this information helps someone as it seemed to help me running into the same issues. I have seen some form of these errors constantly when making changes that did not include increasing allowed http response time. I suppose it is possible the original commenter was using outdated eosio.contracts that didnt work with his EOS version, but I had this issue with multiple versions of eosio.contracts.

heifner commented 4 years ago

Thanks for the additional info.

ksiazkowicz commented 4 years ago

Thanks @netuoso. I tried your solution but it doesn't seem to make much impact for me. I'm using eosio 1.8.9, eosio.cdt 1.6.3 and eosio.contracts 1.8.3. I've set http-max-response-time-ms to 1000, max_transaction_cpu_usage to 150000 and max_block_cpu_usage to 200000. While some runs are successful, others get stuck trying to deploy eosio.system contract.

Here is the output from docker logs eosio 2>&1 | grep "eosio.system" -A 5:

+ retry cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2020-01-24T09:08:54.344 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
deadline exceeded
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
info  2020-01-24T09:09:04.501 nodeos    producer_plugin.cpp:1920      produce_block        ] Produced block 0000002d151a542a... #45 @ 2020-01-24T09:09:04.500 signed by eosio [trxs: 0, lib: 44, confirmed: 0]
error 2020-01-24T09:09:04.609 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2020-01-24T09:09:14.834 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
deadline exceeded
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
info  2020-01-24T09:09:25.001 nodeos    producer_plugin.cpp:1920      produce_block        ] Produced block 000000562aade7a7... #86 @ 2020-01-24T09:09:25.000 signed by eosio [trxs: 0, lib: 85, confirmed: 0]
error 2020-01-24T09:09:25.091 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2020-01-24T09:09:35.350 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
deadline exceeded
netuoso commented 4 years ago

Apologies, it helped me for deploying a few times but now I am re-encountering the issue.

Update: I am using an underpowered laptop and running the testnet in a docker instance via docker-compose. After some digging, it seems the issue I was experiencing with deadline exceeded while deploying the eosio.system contract was related to hardware constraints, not a software config error.

I was able to make my machine immediately start working by increasing the available RAM to the docker process as well as setting a CPU niceness of -18. https://en.wikipedia.org/wiki/Nice_(Unix)

TL:DR; if you are experiencing this issue, try to upgrade your hardware or ensure your virtual process has sufficient resources to deploy the contract within the timespan of a single block.

ndcgundlach commented 4 years ago

Appears to have been a system resource issue.

ksiazkowicz commented 4 years ago

We're having these issues when running this on CI and to be honest I wouldn't call it underpowered. We currently run builds on general1.large codebuild instance, which has 15GB of RAM and 8 vCPUs. The only higher tier has 144GB of RAM and 72 CPUs and I'm not sure if it's worth switching to that one just to check if this resolves our issues as it's 12x the price of our current one. My laptop, on which this also fails half the time with deadline exceeded exception has an i7-4770HQ CPU and 16 GB of RAM.

I tried changing the niceness, but it still consistently takes at least 2 attempts to submit the contract. My machine also doesn't seem to be under much load while running it.

ksiazkowicz commented 4 years ago

I prepared a Docker image and a simple script to reproduce this issue. https://github.com/ksiazkowicz/eos-testnet

The idea behind the script is that we're trying to bootstrap EOS testnet until it succeeds. Sometimes 1st attempt to deploy eosio.system contract will fail, so I wait 10 seconds and try again. After exceeding 10 retries, I'm considering test run as failed, kill the container and try again. By default, the script will run 100 tests.

I ran these tests against an r5.large EC2 instance running Ubuntu 18.04 and on my Mid-2015 MacBook Pro (Core i7, 16 GB RAM).

My results are here if anyone wants to dig in: https://1drv.ms/x/s!ApF6BBqdJeSZkaQ4u3tTOs4cEswMEQ?e=u4bTdc

My Docker Desktop config:

Screenshot 2020-03-10 at 13 44 27
taokayan commented 4 years ago

This maybe the time-out issue on keosd. Can you try to following (sign the transaction manually without invoking keosd)?

./cleos set code eosio eosio.system.wasm -j -d -s -x 3600 > tmp
./cleos sign tmp -k THE_EOSIO_PRIVATE_KEY -p
ksiazkowicz commented 4 years ago

Interesting. So if I understand correctly, you suggest that the transaction itself might be already expired by the time it's signed and ready to be submitted? That would make sense, looking at it again the logs mention that the deadline, not CPU usage is exceeded. I adjusted the bootstrapping script of my Docker image accordingly and ran it again. I added test results here: https://1drv.ms/x/s!ApF6BBqdJeSZkaQ4u3tTOs4cEswMEQ?e=PntAbv

What's interesting about the results, that 98 out of 100 builds were successful at first try. The 2 ones that failed had the error mentioned in the 1st message in this ticket:

info  2020-04-16T07:50:23.712 cleos     main.cpp:3306                 operator()           ] grabbing chain_id from nodeos
info  2020-04-16T07:50:23.901 nodeos    producer_plugin.cpp:2046      produce_block        ] Produced block 000000c2db5907e2... #194 @ 2020-04-16T07:50:24.000 signed by eosio [trxs: 0, lib: 193, confirmed: 0]
error 2020-04-16T07:50:24.154 nodeos    http_plugin.cpp:696           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080004: Transaction exceeded the current CPU usage limit imposed on the transaction
Error Details:
transaction was executing for too long
pending console output:
+ echo retry
+ sleep 10
retry

I attached the log files here: https://1drv.ms/u/s!ApF6BBqdJeSZka8r05JuZtH-4R2YiQ?e=ETt3ac

It's as if there were 2 cases:

  1. Transaction expires until it's signed and pushed to the RPC. These are the builds that succeeded in 1 or more attempts. Default deadline is 30s, that should be more than enough. I noticed that in your example, you increased that to 1 hour and we're no longer hitting that.
  2. Transaction exceeds CPU limits. These builds were discarded after 10 failed retries. It's rare but it still happens.

Our code is available here: https://github.com/ulamlabs/eos-testnet/pull/2

heifner commented 4 years ago

Can you add verbose-http-errors=true which should provide you more info on the timeout.

ksiazkowicz commented 4 years ago

It appears to be enabled:

root@2ff33b4cd2bb:/# ps aux | grep nodeos
root        11  1.5  0.8 134729372 67704 ?     Sl   10:34   3:16 nodeos -d /data --config-dir /app --genesis-json=/app/genesis.json
root      2650  0.0  0.0  11464  1000 pts/0    S+   14:06   0:00 grep --color=auto nodeos
root@2ff33b4cd2bb:/# cat /app/config.ini | grep verbose-http-errors
verbose-http-errors = true

When I submit it manually, the response does include the traceback but it's not really helpful.

{
    "code": 500,
    "message": "Internal Service Error",
    "error": {
        "code": 3080004,
        "name": "tx_cpu_usage_exceeded",
        "what": "Transaction exceeded the current CPU usage limit imposed on the transaction",
        "details": [
            {
                "message": "transaction was executing for too long",
                "file": "transaction_context.cpp",
                "line_number": 493,
                "method": "checktime"
            },
            {
                "message": "pending console output: ",
                "file": "apply_context.cpp",
                "line_number": 113,
                "method": "exec_one"
            }
        ]
    }
}

The weird part is that all runs have the same config and it doesn't matter how many times you try to submit it, it fails with this exception. If I kill the container, remove all its data and spin it up again, it will probably be fine. In fact, I spinned up another container based on the same image and it bootstrapped just fine.

Not sure how useful this is, but here is also cleos get info output for both nodes:

broken node

oot@2ff33b4cd2bb:/# cleos get info
{
  "server_version": "eb4456fd",
  "chain_id": "7479dd536fa543a6e5faafe8f90132f8d1aab58c746d7d7a4e01c10ea091e25a",
  "head_block_num": 26827,
  "last_irreversible_block_num": 26826,
  "last_irreversible_block_id": "000068ca42e0a8dbdd5d5f108a0cda5526a252aea1c1aa4e39515611ad9631f0",
  "head_block_id": "000068cbf0fa8c02aadaa1320a53508d9c4a9f620c8900aa40e9a2693c454120",
  "head_block_time": "2020-04-16T14:19:05.000",
  "head_block_producer": "eosio",
  "virtual_block_cpu_limit": 200000000,
  "virtual_block_net_limit": 1048576000,
  "block_cpu_limit": 200000,
  "block_net_limit": 1048576,
  "server_version_string": "v1.8.13",
  "fork_db_head_block_num": 26827,
  "fork_db_head_block_id": "000068cbf0fa8c02aadaa1320a53508d9c4a9f620c8900aa40e9a2693c454120"
}

working node

root@30e879854ba4:/# cleos get info
{
  "server_version": "eb4456fd",
  "chain_id": "7479dd536fa543a6e5faafe8f90132f8d1aab58c746d7d7a4e01c10ea091e25a",
  "head_block_num": 55,
  "last_irreversible_block_num": 54,
  "last_irreversible_block_id": "000000368762efdad2bac8d4869a5117504bfd923af19057df2159b3b1c5942d",
  "head_block_id": "000000373495308ac2ed3f740c7d27a0b2351f700d7d415c2d1a068d5e86fe16",
  "head_block_time": "2020-04-16T14:18:52.500",
  "head_block_producer": "eosio",
  "virtual_block_cpu_limit": 211077,
  "virtual_block_net_limit": 1106757,
  "block_cpu_limit": 199900,
  "block_net_limit": 1048576,
  "server_version_string": "v1.8.13",
  "fork_db_head_block_num": 55,
  "fork_db_head_block_id": "000000373495308ac2ed3f740c7d27a0b2351f700d7d415c2d1a068d5e86fe16"
}
aroyer commented 4 years ago

to resolve increase the default 30 ms for configuration variable http-max-response-time-ms used by the http plugin in keosd. default is not enough to set the contract for eosio.system using cleos set contract on resource constrained environment.