datastax / dsbulk

DataStax Bulk Loader (DSBulk) is an open-source, Apache-licensed, unified tool for loading into and unloading from Apache Cassandra(R), DataStax Astra and DataStax Enterprise (DSE)
Apache License 2.0
85 stars 30 forks source link

DSBulk's rate limiter is not compatible with speculative executions #447

Open adutra opened 2 years ago

adutra commented 2 years ago

By default, both rate limiting and speculative executions are disabled in DSBulk.

If both are enabled, we observed that, when writing and from the server's perspective, the rate limit is not honored.

This is because rate limit permits are acquired per row written. More specifically, each call to session.executeAsync() will need to acquire permits.

If the request is retried internally by the driver, that's fine, because a retry request is only sent when the initial request has finished, so the invariant acquired <= available is respected and the server never sees more than available concurrent requests.

However, if we enable speculative executions, the driver may trigger a speculative request while the initial request is still in-flight. This will be done without acquiring more permits, since only one call to executeAsync was done. From the client's perspective, the invariant acquired <= available looks respected (we are writing one row only, but with 2 requests), but from the server's perspective, 2 requests were received and the server may find itself processing more than available concurrent requests.

The immediate consequence of such a setup is that Astra starts returning OverloadedExceptions even after #435 was implemented.

I don't have any immediate solution for that. I think we would need to change how permits are acquired for writes: each internal request that the driver sends needs to acquire permits, not only the initial one.

We can achieve this in a few ways – but all of them involve extending driver classes:

I will open driver Jiras for improving the throttling mechanism. But even so I'm reluctant to try the above changes:

Note: I think that reads are not a problem. For reads, permits are acquired per row emitted, after the results page has been received. So speculative executions won't pose any problem here.

Note2: it might be simpler to just give up on rate limit + speculative execs and document this limitation.

Note3: to mitigate this, we could look into something simpler: implementing application-level retries when a write request ends with a DriverTimeoutException. I will create a separate issue for that.

┆Issue is synchronized with this Jira Task by Unito

adutra commented 2 years ago

Issue for application-level retries: #448.

adutra commented 2 years ago

Java driver issues for improving built-in throttling:

JAVA-3036 Expose request in Throttled API JAVA-3037 Apply throttling to retries and speculative executions

adutra commented 2 years ago

This is considered bad practice and could have undesired consequences.

I'd note that Guava's RateLimiter is currently invoked inside Reactor operators and/or (for reads only) also in driver I/O threads: both are bad practices. So all in all, we are already drowning in muddy waters wrt to using blocking code in non-blocking contexts.