Closed AskAlexSharov closed 1 year ago
Hi! I'd like to work on this one and to start need to clarify one thing regarding OpenEtheruem's implementation. Currently parity_pendingTransactions
doesn't have an ability to do proper pagination (only to provide limit
parameter) and does have filtering through passing filter
parameter. Do we need both of these things (pagination and filtering) on Erigon's side?
Also I think it might a bit confusing from user's perspective to have this method named pendingTransactions
since it collides with pending
sub-pool name but actually returns various types of transactions from the pool. Also it feels strange to me naming it pending
because txpool already holds only pending (in terms of awaiting to be included in the block) transactions. What do you think about something generic like txpool_transactions
?
txpool_transactions
or txpool_search
- everything is fine with me.txpool_contentFrom
- single filter by sender - https://github.com/ethereum/go-ethereum/blob/master/internal/ethapi/api.go#L179 - maybe as the first step we can just support this method. It already will cover 80% of use-cases. And this method can't return large response - because there is limit on how much txs from same sender can be in pool - AccountSlots=16. No reason for pagination there. Yes, as the first step - let's add txpool_contentFrom - and while we working on this method - think what user-friendly and grpc-friendly API we can provide to cover last 20% of use-cases. Sounds good. Pagination isn't trivial though: we have transactions stored in BTree that's sorted by (sender, nonce)
, so it would be convenient to use it for pagination and set next_page_token
(as recommended in Google doc) to transaction id. However, it will break if some transaction got evicted from the pool or new transactions will be included in it before the next_page_token
transaction id user tries to retrieve.
Regarding txpool_contentFrom
: seems like we won't need pagination for it, so something like
message SearchRequest {
message Filter {
optional Tx.Type type = 1;
optional bytes from = 2;
}
Filter filter = 1;
}
message SearchReply { repeated Tx txs = 1; }
to start with and extend with pagination and more filters later for txpool_transactions
might work fine.
Was this feature ever finished?
@mtgnoah no, last PR was abandoned
Hi! I want to take a shoot on this issue. I've already implemented the search function with a "from" filter. Right now I want to peach my idea of pagination architecture and get some feedback: There are 2 big classes Offset Based vs Cursor Based. Offset based are won't be good if we work with dynamic data so I lean toward Cursor based.
My proposal is to return a page token representing (address, nonce) of the last transaction we returned. This token should be encrypted/signed base64 encoded (address, nonce) or it should be key (uuid) to a map with (address, nonce) (and maybe a filter).
It's better to keep only a one-way cursor to simplify the implementation and as we want only to return data in chunks it would be excessive to have a two-way routing.
There are 2 ways to provide a filter:
Also crucial to note: After the request has started all data changes behind the cursor would be lost for this request and all data changed after the cursor would be included in the request.
@SozinM thank you. I don't understand “final note”. Do you mean: data inside txpool may change between requests and it's fine, we ignore it and user may see same tx on different pages - because it’s position changed inside txpool between requests. ?
my thinking:
I think you get me a bit wrong here. I plan to implement the rest of the Parlia filters and all I wanted is to validate my design of the pagination.
I don't understand “final note”. Do you mean: data inside txpool may change between requests and it's fine, we ignore it and user may see same tx on different pages - because it’s position changed inside txpool between requests. ?
There would be no pages, only cursor. If we return the first 100 transactions that fit the filter and the user added more transactions that we placed in txpool behind the current cursor - the user will not see these transactions in this request context. User need to start a new request in order to see these newly added transactions.
personally, i'm having trouble seeing why someone would want a paginated filtered subset of the txnpool. To me if someone is doing bulk retreival from the txpool, they probably want it all (no filter), and if they aren't, they probably don't need more than some sane amount of entries (otterscan)
In my opinion, the most useful pagination method would be to be able to paginate by offset in order of entry. This would make the task of keeping track of txns from an external application very easy - which i think should be the goal of erigon.
why: because sophisticated MEV players will have their own filters and search methods - so the priority should be an easy way extract the data, rather than provide an interface they wont use. let them build their own complicated index - i dont think that is erigons job
I see the following as possible use cases for such a method, in order of importance. If i am missing anything let me know
now walking through through of these use cases individually
Sophisticated player worth their salt will keep own index of all pending txns, keeping a local database
one is going to "backfill" their database and then subscribe to notifications for pending transactions, and also probably poll just in case.
in the event that they lose their socket connection, they would want to be able to "catch up" by asking "what transactions have been broadcast in the last
I have not built a block since homestead , so forgive me if my information is outdated.
afaik, builders care either about the highest gas price they can fit into their block, and then include extra transactions based off orders from flashbots or whatever bribe network they use.
tracking the mempool state, like etherscan does. In this case, missing a few pending transcations likely isn't a big deal. In reality this is very similar to the mev use case - except missing txns is fine, so one can probably just backfill+subscribe, not worry about reconnet. On the other hand, something like otterscan could possibly want to be able to display "pending txns" for a specific address. - perhaps search is useful there
The RPCDaemon already supports the ability to subscribe to pending transactions https://github.com/ledgerwatch/erigon/blob/devel/turbo/rpchelper/filters.go#L37
These are stateful and record all the transactions since you last polled the filter. I'm not a big fan of making stateful endpoints, but thought i would mention that this exists as an option to consider
FYI: txpool has GRPC interface: https://github.com/ledgerwatch/interfaces/blob/master/txpool/txpool.proto
FYI: txpool has GRPC interface: https://github.com/ledgerwatch/interfaces/blob/master/txpool/txpool.proto
yeah, I think the grpc interface should be pushed to those looking to do large scale txpool work
possibly, people are scared because grpc is r/w.
is it easy to subset grpc endpoints into ro/rw methods? expose ro endpoint, might be very useful
Another use case I see from an infra-provider point of view: The current implementation of txpool_content is very unfriendly with reverse proxies because we need to crank up limits of max_body_size, and timeout limits. It would be great to have an alternative to this call that would fit in default limits (for example default Nginx limits). Also, it's crucial to have filters so our clients do not abuse our egress and get only the data they need.
Another use case I see from an infra-provider point of view: The current implementation of txpool_content is very unfriendly with reverse proxies because we need to crank up limits of max_body_size, and timeout limits. It would be great to have an alternative to this call that would fit in default limits (for example default Nginx limits). Also, it's crucial to have filters so our clients do not abuse our egress and get only the data they need.
not sure i agree here.
why does the customer want this service in the first place? if customers are simply using txpool_content to create their own local databases, perhaps the service should be providing compressed txpool dump for this use case.
if a service provider wants to give its users a new method to view txns - i think that should be implemented at the service provider level. Needing to support some specific endpoint for service providers seems like a nightmare that is out of scope for erigon (what happens when need to remove the method to accommodate a major architecture change?)
also, i think it is up to the service provider to sanitize input to their nodes such that customers do not abuse their egress. I think "allowing service providers to blindly forward requests through nginx" is something erigon should not encourage.
even small infra services should be at the very least validating the jsonrpc requests themselves before forwarding to erigon, and ultimately we should be using a combination of the grpc and json-rpc to fully leverage available performance.
FYI: txpool has GRPC interface: https://github.com/ledgerwatch/interfaces/blob/master/txpool/txpool.proto
yeah, I think the grpc interface should be pushed to those looking to do large scale txpool work
possibly, people are scared because grpc is r/w.
is it easy to subset grpc endpoints into ro/rw methods? expose ro endpoint, might be very useful
All doable. But we never had this as a goal. Maybe need consider
@elee1766
if a service provider wants to give its users a new method to view txns - i think that should be implemented at the service provider level.
It was not my point. My understanding is that providing more flexible ways to access data will improve erigon adoption. About the method, I think it makes sense to implement txpool_contentFrom with the param "from" and without pagination. 2 use cases were highlighted in the comments:
Also, I see here a beneficial point that it will bring in the method implemented in ethereum and could help in onboarding for those who are switching from geth client and are using this txpool method.
t was not my point. My understanding is that providing more flexible ways to access data will improve erigon adoption. About the method, I think it makes sense to implement txpool_contentFrom with the param "from" and without pagination.
i see what you mean now - misunderstood. that makes sense to me.
@AskAlexSharov WDYT about this?
My opinion: let's do something (at least start work on it) what can be used by Otterscan to overkill Etherscan in UI. Simple txpool_contentFrom
is clearly not enough. parity_pendingtransactions is not enough (no order, etc...) but it's clearly way more extensible (can add new filters, orders in future without breaking computability).
Technical details:
Range(Table, fromPrefix, toPrefix, orderAscend, limit)
kv/kv_interface.go:Range
https://github.com/ledgerwatch/erigon-lib/blob/main/kv/kv_interface.go#L319kv/kv_interface.go:Stream
https://github.com/ledgerwatch/erigon-lib/blob/fc3dd4fd27895b1448086b5f7f7755e694e55291/kv/kv_interface.go#L460 kv.Stream
: erigon-lib/kv/remotedb/kv_remote.go:rangeOrderLimit
https://github.com/ledgerwatch/erigon-lib/blob/main/kv/remotedb/kv_remote.go#LL635C21-L635C36 @SozinM FYI: here is some example implementation of similar filers in .proto: FilterTree
https://github.com/dgraph-io/dgraph/blob/main/protos/pb.proto#L67
This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
https://openethereum.github.io/JSONRPC-parity-module#parity_pendingtransactions
Probably will need to do in several PR's, add similar method to txpool's grpc interface. Also our TxPool has 3 sub-pools: pending, baseFee, queued. (parity and geth have only pending and queued). This fact must reflect in api also.
We have txpool_content - but it doesn't have pagination (and we can't break compatibility here) - and if transfer 300K transactions in 1 message it's almost 1Gb. We need keep many limits (rpc and grpc) very hight because of this method.