Closed Jackie-Jiang closed 2 years ago
There is an indication that the response is partial. Why add an exception? If there is a need for the reason, then we can add a "reason" field ?
We don't really have that for case 1 and 2. For case 3, we should not expect normal users to check numServersQueried
and numServersResponded
and compare them to tell whether the result is partial. (The partialResponse
flag is in LinkedIn internal code IIRC)
We don't really have that for case 1 and 2. For case 3, we should not expect normal users to check
numServersQueried
andnumServersResponded
and compare them to tell whether the result is partial. (ThepartialResponse
flag is in LinkedIn internal code IIRC)
If partial response flag is not set in all cases, then that is a bug. Let us definitely fix it.
I think applications check for exceptions and assume that there is no result. So, exceptions when there are partial results may not be the right answer.
We don't have that flag in OSS. IIRC the logic for that flag is check if numServersQueried
matches numServersResponded
and whether there are exceptions.
We don't model exceptions this way. Having exceptions does not indicate no result. It actually indicates the results are not complete (e.g. one segment throws exception, others work fine). When broker failed to route the query to a server, it also adds an exception.
+1
I also suggest we change the default behavior to disallow partial response by default, and provide the flag to turn it on. In many use cases, data quality and correctness is a serious issue. So it's ok to fail the query but not so to return a partial result silently.
I think the fundamental concern here is that partial results might be returned without the users taking cognisance of them. While having a flag like partialResponse
solves to an extent, it is still prone to being missed by the user (many users don't even look at the response stats if the query was fast enough). I agree with @yupeng9 that the default behaviour in such cases should be failing the query. If the user has explicitly specified that he is okay with partial responses, then it can be indicated in a flag to the user.
So this entails three changes:
@mcvsubbu @Jackie-Jiang do we have an agreement on this requirement? If yes, I can take it up.
Partial responses and approximate results would be problematic in OLTP databases, but they make full sense in an analytical database such as Pinot and should be fully supported as first class citizens without throwing error or exception conditions. As an analytical database, there is value in providing partial responses or approximate results when full responses or accurate results are either missing or will take too long to generate provided that we can quantify how partial or approximate the results are. We already do a lot of this with approximation functions such as DISTINCTCOUNTHLL, high-cardinality group by estimates, so there shouldn't be anything wrong with providing partial responses as long as it can be quantified as to how partial the responses are. These should be treated different from exceptions, errors, and warnings.
@amrishlal Partial and approximation are different, where approximation usually gives bounded error rate. The cases we want to notify the user here is when some segments are not available (e.g. all replica down), or some servers fail to respond (e.g. timeout). We will still return the response, but put some message in the query response so that user knows the result is not complete. I don't see how this kind of problem is different from exceptions though. If some segments throw exception on a given query, we can also treat it as partial because it is essentially the same as all replicas down for these segments. User can choose to ignore the exceptions and use whatever returned in the response.
The cases we want to notify the user here is when some segments are not available (e.g. all replica down), or some servers fail to respond (e.g. timeout).
I can easily see a case where a user may specify a timeout for a long running query as a way to get partial results early without waiting for those one or two servers that may take extra long time to process the results. In cases like these, one could set a warning in query results and try to quantify how accurate the results might be (i.e result is based on responses of 9 out of 11 segments, etc).
I don't see how this kind of problem is different from exceptions though. If some segments throw exception on a given query, we can also treat it as partial because it is essentially the same as all replicas down for these segments.
I think it's worth distinguishing between success
, warning
, and errors
states (some databases do this quite well with large list of success, error, and warning codes that are returned along with results). The problem with throwing exception is that it is commonly used for error conditions, tends to be thrown arbitrarily at times so one can't usually distinguish between warning and errors, and requires special handling similar to errors conditions, so users would end up treating warnings as exceptions. Even in case of approximate functions such as DISTINCTCOUNTHLL it may be a good idea to set a warning saying that results are approximate and not definite.
Purely from the user's perspective, this certainly looks like a warning
than an error
("Here are your results but be warned that these are partial"). I think regardless of how we do it in code, a flag indicating this for all the three cases above would be useful. Even now as @Jackie-Jiang pointed out, the user has to check numServersQueried
and numServersResponded
and compare them to tell if the result is partial. This is unnecessarily complex.
I think the key here is to inform users and give them the options to choose the desirable behavior. For your example, users can choose DISTINCTCOUNT vs DISTCINTCOUNTHLL for the exact result or approximation. Similarly, users can also choose not to take partial results. For analytical databases, there are use cases that demand exact results, cannot accept partial results. Just to give one example, there are use cases at Uber that pinot results are used in the billing processing and accounting.
@Jackie-Jiang shall we start with adding a partial response flag here?
I'd suggest adding different errors for each scenario described above to the QueryException
, and users can decide on how to proceed based on the error type (error code). We already have similar errors such as BROKER_REQUEST_SEND_ERROR_CODE
which is added when a server cannot be reached
Not all the partial response is useless or harmful. There is another scenario that falls into partial response, which is that isNumGroupsLimitReached()
from brokerResponse returns true. In this case, user can still leverage the response of the query as it still provides insights from the response.
@Jackie-Jiang Yes in addition to the specific error codes that you're suggesting to add, do you think an explicit partialResponse = true
flag would be helpful for the users? Especially for users who do not want partial responses, it'll be very easy for them to discard the results of a query just by looking at this flag.
We need to think about how to set this partialResponse
flag. What scenarios should be count as partial
Can we add it as a query option (allowPartial)?
@yupeng9 Current pinot behavior will always return the available results, and set exceptions for unexpected scenarios such as unable to reach to a server. For now I would suggest following it, and the client can decide whether to retry based on the exceptions received. This issue is mainly to discuss the cases not captured
I have taken up the exceptions part @Jackie-Jiang. Let us think about the partialResponse
flag post this.
I have been a bit busy throughout last week, will raise a PR for this soon.
There are 3 scenarios when the broker response might be partial:
numUnavailableSegments
inBaseBrokerRequestHandler
line 718numSegmentsQueried > numSegmentsAcquired
inServerQueryExecutorV1Impl
line 170numServersQueried > numServersResponded
inSingleConnectionBrokerRequestHandler
line 128Currently these partial responses are only tracked by metrics/query stats, but not modeled as an exception. We should add an exception to the broker response to inform the users that the response might be partial