erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.1k stars 1.09k forks source link

Erigon OOM due to rpc filters #11890

Closed taratorio closed 1 week ago

taratorio commented 1 week ago

System information

Erigon version: 2.60.6

Expected behaviour

Stable Erigon process without OOMs

Actual behaviour

Consistent OOMs coming from rpc filter subscriptions, e.g. eth_newFilter, eth_newBlockFilter, eth_newPendingTransactionFilter, eth_getFilterChanges, eth_getFilterLogs, etc.

Reported on discord: https://discord.com/channels/687972960811745322/983710221308416010/1281252059634597908 dump4

Also reported by another user who also provided a fix for this. However those fixes were only applied to Erigon 3 and not to Erigon 2. Series of fixes:

  1. https://github.com/erigontech/erigon/pull/10672
  2. https://github.com/erigontech/erigon/pull/10718 (contains good description of problem and solution)
  3. https://github.com/erigontech/erigon/pull/10826

Recommendation is to run Erigon with the below additional flags (taken from fix 2. above) if hitting the same issue.

--rpc.subscription.filters.maxlogs=60
--rpc.subscription.filters.maxheaders=60
--rpc.subscription.filters.maxtxs=10_000
--rpc.subscription.filters.maxaddresses=1_000
--rpc.subscription.filters.maxtopics=1_000

However, note that these limits need to be co-ordinated with the way users of your filter rpc apis are doing polling of the filters and their requirements (ie if the limits are too low for the polling frequency of users then that may result in some events getting dropped). Best thing to do is to measure the subscription metrics (again, added in fix 2. above) and set the limits based on that and also based on your users polling frequency and requirements. Metrics are:

var (
    activeSubscriptionsGauge                 = metrics.GetOrCreateGaugeVec("subscriptions", []string{filterLabelName}, "Current number of subscriptions")
    activeSubscriptionsLogsAllAddressesGauge = metrics.GetOrCreateGauge("subscriptions_logs_all_addresses")
    activeSubscriptionsLogsAllTopicsGauge    = metrics.GetOrCreateGauge("subscriptions_logs_all_topics")
    activeSubscriptionsLogsAddressesGauge    = metrics.GetOrCreateGauge("subscriptions_logs_addresses")
    activeSubscriptionsLogsTopicsGauge       = metrics.GetOrCreateGauge("subscriptions_logs_topics")
    activeSubscriptionsLogsClientGauge       = metrics.GetOrCreateGaugeVec("subscriptions_logs_client", []string{clientLabelName}, "Current number of subscriptions by client")
 )

Need to port these 3 fixes to Erigon 2. User reporting this on discord did a test run with the above 3 changes on a test branch and reported stable behaviour: https://discord.com/channels/687972960811745322/983710221308416010/1281287550492737538

image