attestantio / go-eth2-client

Apache License 2.0
102 stars 59 forks source link

fix: passing of timeouts in `chunkedValidators` #142

Closed y0sher closed 1 month ago

y0sher commented 1 month ago

Timeouts weren't being passed to chunked validators request, causing deadline exceeds on big queries. All credits goes to @moshe-blox which is on vacation, just opening the PR to allow faster resolve on main branch.

mcdee commented 1 month ago

I think this is a little more subtle than it appears at first glance. For example, say you send a request to obtain validators with a timeout of 2s. The number of validators requested means that there are 10 chunked requests to make, and each takes 1.5s to return. In this situation the call will take 15s, even though a timeout of 2s was passed to it.

One option would be to calculate the number of chunks, and split the timeout by that number, (e.g. if there were 10 chunks and a top-level timeout of 20s then each chunk's timeout would be 2s) but it may be that the first request takes longer than the others, if it's fetching data from an older state which is then cached for subsequent chunks. So there is no ideal solution with the current setup.

That said, not passing any timeout to the chunked requests is probably the worst situation so I'm inclined to take this and accept the rough edge.

The is a newer POST endpoint for validators; last time I looked it wasn't fully supported by all clients but I'll take another to see if it is now available. That would address this issue and allow us to remove the whole chunking code, which falls into the "unwanted but necessary" category.

y0sher commented 1 month ago

@mcdee I agree the passed timeout should probably include all requests. We don't want to consumer to have longer timeout than what passed. But as you mentioned, currently no timeout passed at all caused for us issues where we couldn't pass big requests at all. I used that endpoint in Prysm and LH for sure.

mcdee commented 1 month ago

Yeah I think that I'm going to accept this as-is, and accept that there is a weakness here that will be addressed by using the POST variant once it is fully supported.