Closed banteg closed 2 years ago
@banteg is there anything more fresh that has this or similar behaviour? I have a pruned node, so I can't check that far back in history. Or similar transactions that aren't 1/2 of a year old are tracing just fine?
This is also affecting the latest stable
/beta
/depreciated
release.
{"jsonrpc":"2.0","id":1,"method":"debug_traceTransaction","params":["0xb9e6b6f275212824215e8f50818f12b37b7ca4c2e0b943785357c35b23743b94"]}
This is because of this PR: https://github.com/ledgerwatch/erigon/pull/2779
ah, okay, then we probably need to think about adding some kind of pagination/limitation for these traces, or some binary response
I wonder also if another json-serialization lib could work there, the marshal/json isn't the most frugal code
@banteg is there anything more fresh that has this or similar behaviour? I have a pruned node, so I can't check that far back in history. Or similar transactions that aren't 1/2 of a year old are tracing just fine?
no, my dataset consisted of 11,000 transactions and only these three had this behavior
I'm also having this issue with stable release, tx 0x42b8205ed4c9d9de39340999c05327543f422b4ca881ae5910d56b3ad62d19c6
okay what we can try to do is try to change the json serialization library in an experiment branch and then @banteg @darkhorse-spb if you can test it on your machines and see if that helps at all
@mandrigin debug_traceTransaction
already using jsoniter.Stream
serialization lib, and it must do streaming (in no-batch and no-websocket cases), probably it doesn't because I disabled it in ./rpc/handler.go handleMsg
to fix broken JSON format in case of errors
error.
It's impossible to stream json, and return error if error happened in the middle of streaming. Because json is not streaming-friendly format.
I also have a weird idea of using ETL to first dump everything to the binary files, check for errors and then stream results.
but the question is also, what eats all this RAM?
@banteg can I ask you to run Erigon with the built-in rpc daemon and with --pprof
and then when it begins eating RAM, maybe at 60 or 80 GB, do curl http://127.0.0.1:6060/debug/pprof/heap > heap.out
and attach this file here? then I can look at the profiler too
We decided to enable back streaming feature by default:
https://github.com/ledgerwatch/erigon/pull/4647 Erigon has enalbed json streamin for some heavy endpoints (like trace_*). It's treadoff: greatly reduce amount of RAM (in some cases from 30GB to 30mb), but it produce invalid json format if error happened in the middle of streaming (because json is not streaming-friendly format)
We decided that value from this streaming is higher than handling "error happen in the middle" rare corner case. But added flag: --rpc.streaming.disable if users wish to pay for correctnesses or compatibility.
@banteg @darkhorse-spb can you check in the current devel version and see if it helped?
but it produce invalid json format if error happened in the middle of streaming (because json is not streaming-friendly format)
Is it Go Code? We ran into the same issue with TrueBlocks. We stream our data too.
We were able to get around it using a defer
call that closes and open JSON objects or arrays. It's not perfect -- it doesn't work that well with nested objects, but it works for simple arrays and simple objects for example. If any sub-routines return an error, the defer
simply closes the array.
If the program crashes, and a subroutine never returns, it doesn't work, but the program crashed, so something isn't working anyway.
Then user will not see error message at all
We attach the error as another field in the object in the defer
method. Not perfectly compliant JSON, but it works. (Perfectly compliant JSON, if it returns an error, should return empty data -- but that's not possible since you've already streamed the data.)
@tjayrush it even may work in many client libs. do you have some open-source example?
I'm almost embarrassed to show it. It's super hacky, but here's an example: https://github.com/TrueBlocks/trueblocks-core/blob/feature/new-unchained-index-2.0/src/apps/chifra/internal/chunks/handle_addresses.go#L66. The RenderFooter routine (which closes an array and an object (everything our API delivers has the same shape) get called even if an error happens. We deliver the error on standard error many levels above this code, so it just closes the JSON object and returns the error (or nil if there is no error).
tnx, will try tomorrow
@AskAlexSharov do you want to keep this one around?
It’s fixed - streaming enabled. But we need add this approach also: https://github.com/ledgerwatch/erigon/issues/4637#issuecomment-1176407488
okay @nanevardanyan will take a look at the error handling then
seems fixed on erigon's side, but clients would need to consider streaming too. one of the traces i reported yields a 66.5GB response. here is a small script which will show both compressed and uncompressed size of the response.
https://gist.github.com/banteg/98dbccbf6e2a3f997199a1b16eb93c5a
reran with my dataset. you can clearly see the outliers i found earlier:
here are response sizes:
0x9ef7a35012286fef17da12624aa124ebc785d9e7621e1fd538550d1209eb9f7d = 41.4 GB (2.2 GB compressed)
0xd770356649f1e60e7342713d483bd8946f967e544db639bd056dfccc8d534d8e = 43.9 GB (2.4 GB compressed)
0x2428a69601105c365b9fe9d2f30688b91710b6a43bc6d2026344674ae7ffcac3 = 50.4 GB (2.9 GB compressed)
0xb9e6b6f275212824215e8f50818f12b37b7ca4c2e0b943785357c35b23743b94 = 71.5 GB (3.5 GB compressed)
all other traces are under 4 GB.
System information
Erigon version: erigon version 2022.07.1-alpha-09776394
OS & Version: Linux
Commit hash : 0977639431fe520fc77399d03cdeba36526d2d52
Expected behaviour
an rpc call returns a trace
Actual behaviour
erigon gobbles up 100gb+ or memory and gets killed by the system
Steps to reproduce the behaviour
run
debug_traceTransaction
against any of these txs:Backtrace
not available, erigon gets killed by the system