Aircloak / aircloak

This repository contains the Aircloak Air frontend as well as the code for our Cloak query and anonymization platform
2 stars 0 forks source link

No fixed alignment for analyst #770

Open yoid2000 opened 8 years ago

yoid2000 commented 8 years ago

This issue is adopted from #766

This issue specifies how to let the analyst specify any range.

The steps are as follows:

  1. Do "make change", converting the analyst's range into legitimate FA'd ranges.
  2. Do SaD on each resulting range.

For making change, any range with col > X is converted into col <> X AND col >= X. Any range with col <= Y is converted into col < Y OR col = Y. FA is done on the <> and = terms as usual.

Obviously any given range can result in an almost arbitrary number of sub-ranges. Probably the way to deal with this is to simply limit the number of sub-ranges. With say 10 or 15 sub-ranges, the "change" is pretty close to the original range anyway.

Note that every new sub-range, and associated SaD creates an opportunity for perturbation in the answer, because rows may be dropped. The analyst should somehow be made aware of this. "Best practice" should be to make proper FA ranges in the first place, if practical. Maybe the answer is to have an info message telling the analyst that the range has been partitioned, and a pointer to the documentation that discusses this.

yoid2000 commented 8 years ago

@sebastian you can pull this out of the icebox when you think it is ready

yoid2000 commented 8 years ago

Aha, for reporting we can do the following:

  1. If the cloak cannot make perfect change, then it informs the analyst that the range has been modified, essentially as we do today.
  2. If SaD identified in any LC sub-ranges, then the analyst is given a message to the effect of "Rows may have been removed from the following sub-ranges: range1, range2, ..."

But if neither of the above took place, then in fact there was no extra distortion, and no need to give a message.

sebastian commented 8 years ago

But if neither of the above took place, then in fact there was no extra distortion, and no need to give a message.

I am not sure this statement is true? As soon as we create individual sub-ranges and join them, we are going to introduce more noise, whether or not there was shrink and drop performed or not.

@sebastian you can pull this out of the icebox when you think it is ready

Sure. I'd rather keep it in the icebox a while longer. We have to focus on the other aspects like generic TeamBank work and generic anon-work at the moment.

yoid2000 commented 8 years ago

But if neither of the above took place, then in fact there was no extra distortion, and no need to give a message.

I am not sure this statement is true? As soon as we create individual sub-ranges and join them, we are going to introduce more noise, whether or not there was shrink and drop performed or not.

No, there really is no new noise if no SaD happened. We don't need to add individual noise per sub-range. We just do SaD per sub-range, and then add a single noise value to the surviving rows as usual.

Certainly fine to keep on ice for the time being.

sebastian commented 7 years ago

I am tentatively adding this for our release this summer (17.3).

sebastian commented 7 years ago

Given the new work on reducing SaD, we need to think through how that influences making change.

Making change requires one of:

otherwise making change becomes the equivalent of not doing fixed alignment at all.

sebastian commented 7 years ago

Moving to 18.1 it would be good to have! At that point we'll hopefully have OR in place, and can start on something like this more easily.

sebastian commented 7 years ago

This is moved to 18.2 for now as the upcoming work and finessing of the anonymization will influence whether or not this is feasible.