diffix / explorer

Tool to automatically explore and generate stats on data anonymized using Diffix
MIT License
2 stars 1 forks source link

Throttle queries by query state #281

Closed sebastian closed 4 years ago

sebastian commented 4 years ago

The query states are:

  running_states = [
    :started,
    :parsing,
    :compiling,
    :awaiting_data,
    :ingesting_data,
    :processing,
    :post_processing
  ]

It would be a neat feature to be able to put a different limit on the phases which are dependent on the database and those that are purely cloak based ones. In our case the limitation really is that we do not want to put too much load on the database, whereas we have ample compute to process many queries in parallel inside the cloak. For example we could therefore do something like classifying the following states as DB bound:

    :started,
    :parsing,
    :compiling,
    :awaiting_data,
    :ingesting_data,

and the following as cloak bound:

    :processing,
    :post_processing

and then say that we allow up to N queries in the db bound states, and M in the cloak bound ones. Restricting the latter is going to be harder though to be honest, since the other queries eventually flow from the db bound ones into the cloak bound ones.

My first recommendation/request would be to make the semaphore only account for the phases that are DB bound, and then release the semaphore once the processing has moved into the cloak.

dandanlen commented 4 years ago

This should be possible, yes.

We could have a semaphore with a limit of N nested withing a semaphore with a limit of M+N - would that work?

sebastian commented 4 years ago

I suggest we don't do this for the time being. It's a "nice to have" more than anything. Let's focus on getting the basic functionality right, and then start working on the multi-column data quality!

dandanlen commented 4 years ago

An early version of this is here.

Would need to be checked again and merged with recent changes.

sebastian commented 4 years ago

How ready do you feel this is?

I skimmed through the code, and it seems fine at a firt glance. My concern with semaphores is always that there might be some edge case leading to starvation and permanent blocking if one isn't very careful.

Have you had a chance to test this properly? By whatever definition of properly that you choose.

dandanlen commented 4 years ago

Hasn't been tested at all, but my experience so far with SemaphoreSlim is that it works without too many surprises. Will need some work to integrate with latest master but I can do this and run some basic tests to get a better idea.