diffix / explorer

Tool to automatically explore and generate stats on data anonymized using Diffix
MIT License
2 stars 1 forks source link

HTTP timeout #243

Closed sebastian closed 4 years ago

sebastian commented 4 years ago

Relatively frequently I see there have been timeouts when querying Diffix Explorer. You can see an instance here: https://demo.aircloak.com/admin/diffix-explorer/LendingClub

I would expect the HTTP handler to give fast responses even if it's busy analyzing a data source?

dandanlen commented 4 years ago

I would expect fast responses too, although 'fast' is relative. How forgiving is the HTTPoison timeout?

Would be good to have extra context here... Which endpoint is being queried? Is the issue reproducible / Does it seem to recur for the same tables / columns or is it random?

sebastian commented 4 years ago

The /explore endpoint is partly IO-bound: it can block on an external request to Air to get the data sources for validation - we don't have a timeout on this request so this could potentially cause a long wait.

Does getting the data source for validation have to happen synchronously as part of the call? Couldn't that be a result that is provided asynchronously like any other? I.e. you accept the job, return an exploration ID, and then if the request is broken then you provide information about that during the next request to /result?

sebastian commented 4 years ago

This is an issue we are likely going to hit quite a bit when one hits the "reanalyze all data sources" button. The system will be clogged up with pending queries and loading the schema might therefore be delayed.

dandanlen commented 4 years ago

Does getting the data source for validation have to happen synchronously as part of the call? Couldn't that be a result that is provided asynchronously like any other? I.e. you accept the job, return an exploration ID, and then if the request is broken then you provide information about that during the next request to /result?

No, the only reason to do this synchronously was to provide some useful immediate feedback to the caller (eg. invalid api key, bad datasource / table/ column).

I think offloading this to the main exploration job is the best solution, I'll implement this.

sebastian commented 4 years ago

Thanks