Closed ckeshava closed 5 months ago
@mvadari what do you think?
I believe the way that xrpl.js handles this is just by using an incrementing ID value. Maybe that would be a better way of handling this - as long as we're able to do so in an atomic manner even in async code.
Since this is essentially an application of the birthday problem, I ran some calculations on the probability of a collision:
This should provide enough data to make a decision on whether this is an important change.
no, I'm not alluding to the birthday problem.
Since we are appending pseudo-random numbers, it is very unlikely we see a collision at all. Owing to the correctness of the math library, I'm quite sure we'll never see a collision in practice. Identical requests also end up with unique RequestIds, is that not a problem?
In my branch, I'm bombarding the client with identical requests -- and I observe the queue size growing. A cache can be helpful in fulfilling all the extraneous requests (until the data becomes stale)
no, I'm not alluding to the birthday problem.
Since we are appending pseudo-random numbers, it is very unlikely we see a collision at all. Owing to the correctness of the math library, I'm quite sure we'll never see a collision in practice.
The correctness of the math library isn't a concern. https://github.com/XRPLF/xrpl-py/blob/43636217953ecff424f9e5ba77faa97a2f9231a7/xrpl/asyncio/clients/websocket_base.py#L18 shows that the pseudo-random numbers are being generated in a range of 1,000,000 (1 million). I applied the birthday problem math to any two numbers in that range.
Identical requests also end up with unique RequestIds, is that not a problem?
In my branch, I'm bombarding the client with identical requests -- and I observe the queue size growing. A cache can be helpful in fulfilling all the extraneous requests (until the data becomes stale)
They're different requests. It's rippled's job to cache, not the library's. If I'm sending the same request, I want the data from rippled, and caching could result in incorrect info. It changes as often as every 4 seconds (potentially more frequently if the UNL is smaller, or if I'm running a standalone node that closes super fast).
Hmm, okay. Yeah, my concern was related to caching.
yes, caching inside rippled makes more sense than the library. although, I suspect ledger close times is closer to 6-7 seconds, rather than 4.
I'll close this issue now.
yes, caching inside rippled makes more sense than the library. although, I suspect ledger close times is closer to 6-7 seconds, rather than 4.
Ledger close times are proven to be around 4. See the metrics at the top of https://livenet.xrpl.org/ (which use real time, not rippled fake time).
I did in fact see collisions of the randomly generated ID when using the lib for load-testing a private rippled cluster.
I needed a quick fix at the time, and this is it:
_REQ_ID_BITS: Final[int] = 16
async def _inject_request_id(request: Request, open_requests: _REQUESTS_TYPE) -> Request:
"""
Given a Request with an ID, return the same Request.
Given a Request without an ID, make a copy with a randomly generated ID.
Arguments:
request: The request to inject an ID into.
Returns:
The request with an ID injected into it.
"""
if request.id is not None:
return request
request_dict = request.to_dict()
while True:
id = f"{request.method}_{getrandbits(_REQ_ID_BITS)}"
if id not in open_requests:
request_dict["id"] = id
break
await asyncio.sleep(0) # prevent deadlock in case all possible IDs in use
return Request.from_dict(request_dict)
Note that the function is now async to prevent deadlock, but that may just be paranoid.
I see, thanks for the code snippet. Can you tell us a little bit about your load-testing environment? How many requests per second are you sending in? And what is the hardware capacity of the rippled server (compute/storage/network configuration) ?
request_id
is generated inWebsocket
clients for the purpose of de-duplication -- Source: https://github.com/XRPLF/xrpl-py/blob/43636217953ecff424f9e5ba77faa97a2f9231a7/xrpl/asyncio/clients/websocket_base.py#L32However, these IDs are generated in a pseudo-random fashion. Every incoming request is logged as a unique request, thanks to the pseudo-random nature of the ID. This does not help with the de-duplication effort.
There is a demonstration of this issue at the tip of this branch: https://github.com/ckeshava/xrpl-py/tree/requestID Please execute
poetry run python snippets/duplicate_requests.py
to observe the websocket-client's queue. I have included an artificial delay on the server-side, in order to simulate certain delays. (Without these delays, all the requests are full-filled near instantaneously, hence rendering the ID obsolete)To ensure that no requests are duplicates, we'd need to maintain a time-based cache where every request is crystallized through a SHA-256 hash of its contents, as its unique identity.