diffix / explorer

Tool to automatically explore and generate stats on data anonymized using Diffix
MIT License
2 stars 1 forks source link

Don't return duplicate error messages. #289

Closed dandanlen closed 3 years ago

dandanlen commented 4 years ago

Often (almost always?) the same error message is reported multiple times and also added to the errors list multiple times.

Example:

Error in column exploration for `gda_scihub_link` / `downloads` / `datetime`.

Error in column exploration for `gda_scihub_link` / `downloads` / `datetime`.

Error in column exploration for `gda_scihub_link` / `downloads` / `datetime`.

Error in column exploration for `gda_scihub_link` / `downloads` / `datetime`.

Error in column exploration for `gda_scihub_link` / `downloads` / `datetime`.

We should:

Errors are added in this catch block in ExploreController:

catch (Exception) when (exploration.Completion.Exception != null)
{
    // Log any other exceptions from the explorer and add them to the response object.
    logger.LogWarning($"Exceptions occurred in the exploration tasks for exploration {explorationId}.");

    foreach (var innerEx in exploration.Completion.Exception.Flatten().InnerExceptions)
    {
        logger.LogError(innerEx, "Exception occurred in exploration task.", innerEx.Data);
        exploreResult.AddErrorMessage(innerEx.Message);
    }
}

It looks like the flattened list of AggregateExceptions contains multiple copies of the same exception. In theory is only thrown once - we need to make sure this is actually the case and we're not somehow triggering the same exploration multiple times. In this case, the error duplication might be a quirk of the AggregateException class and how Tasks manage exceptions.

Another possibility is that the top-level 'Error in column exploration' exception is triggered multiple times by different inner exceptions, so in this case we can either add the inner exception messages too - but IMO this is too much detail for the api user, and the the detail will be available through sentry anyway.

dandanlen commented 3 years ago

Closing this. The symptom (duplicate messages) has been fixed.

The origin of the issue is when any kind of query error (connection problem, malformed queries, auth issues) affects multiple queries running against the same datasource concurrently. This manifests in multiple duplicate error messages which are caught and logged to the api caller with the above simplified message. So multiple different underlying queries encountering an error at the same time => duplicate error messages to the client. It's worth noting that the underlying exceptions are logged to stdout as well as Sentry (when enabled).