filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.84k stars 1.26k forks source link

MaxFilterResults is applied before filtering for eth logs #12636

Open rvagg opened 3 days ago

rvagg commented 3 days ago

Noticed by @ArseniiPetrovich:

$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D4"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | uniq | wc -l
32
$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D5"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | uniq | wc -l
32

and

$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D4"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | head -1
"0x42b4ac"
ubuntu@ip-192-168-16-104:~$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D5"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | head -1
"0x42b4ae"

i.e. extending the range of epochs while leaving the start of the range static drops events from the front of the list.

This is because we apply MaxFilterResults max (the invisible cap) before filtering for eth events, not after.

For that same block range:

$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"Filecoin.GetActorEventsRaw","params":[{"fromHeight":4371628,"toHeight":4371668}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result|length'
10000
$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"Filecoin.GetActorEventsRaw","params":[{"fromHeight":4371628,"toHeight":4371669}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result|length'
10000

10000 is the default value for MaxFilterResults.

  1. We should apply that max after filtering for eth events BUT this limit is there to deal with DoS potential and one of the DoS potentials we're dealing with is memory consumption in collecting and processing these events. We can't just uncap it when collecting and then cap it after filtering because it would be trivial to make a query that collects events off a huge range of epochs. So we need to move the eth filter closer in to the query. I think we might be able to either build in some SQL that can do it, or add something to ChainIndexer in the filtering that can do it.
  2. We should consider returning an error when we hit MaxFilterResults. The fact that you can't know is a big problem. The best signal you have is if you hit 10,000 events returned. I'd like to build pagination into GetActorEvents (maybe not Raw) but we can't do that for eth APIs, the user has to have a clear way of knowing that they need to craft finer filters. go-ethereum might have a precedent here that we should investigate.
aarshkshah1992 commented 3 days ago

@rvagg Proposed a solution for this at https://filecoinproject.slack.com/archives/CP50PPW2X/p1729778184348849.

rvagg commented 3 days ago

Some thoughts I jotted down in that thread, here for the record:

I also think we should error in the eth APIs if we go over max, not having a signal for users makes for a big footgun. But let's deal with that separately.

aarshkshah1992 commented 3 days ago

@rvagg @akaladarshi

A great first step that will get us most of the way there is to add a "filter for Raw codec" condition to every query made by all the ETH Event APIs.