Open yoid2000 opened 8 years ago
@sebastian you can pull this out of the icebox when you think it is ready
Aha, for reporting we can do the following:
But if neither of the above took place, then in fact there was no extra distortion, and no need to give a message.
But if neither of the above took place, then in fact there was no extra distortion, and no need to give a message.
I am not sure this statement is true? As soon as we create individual sub-ranges and join them, we are going to introduce more noise, whether or not there was shrink and drop performed or not.
@sebastian you can pull this out of the icebox when you think it is ready
Sure. I'd rather keep it in the icebox a while longer. We have to focus on the other aspects like generic TeamBank work and generic anon-work at the moment.
But if neither of the above took place, then in fact there was no extra distortion, and no need to give a message.
I am not sure this statement is true? As soon as we create individual sub-ranges and join them, we are going to introduce more noise, whether or not there was shrink and drop performed or not.
No, there really is no new noise if no SaD happened. We don't need to add individual noise per sub-range. We just do SaD per sub-range, and then add a single noise value to the surviving rows as usual.
Certainly fine to keep on ice for the time being.
I am tentatively adding this for our release this summer (17.3
).
Given the new work on reducing SaD, we need to think through how that influences making change.
Making change requires one of:
otherwise making change becomes the equivalent of not doing fixed alignment at all.
Moving to 18.1
it would be good to have! At that point we'll hopefully have OR
in place, and can start on something like this more easily.
This is moved to 18.2
for now as the upcoming work and finessing of the anonymization will influence whether or not this is feasible.
This issue is adopted from #766
This issue specifies how to let the analyst specify any range.
The steps are as follows:
For making change, any range with
col > X
is converted intocol <> X AND col >= X
. Any range withcol <= Y
is converted intocol < Y OR col = Y
. FA is done on the<>
and=
terms as usual.Obviously any given range can result in an almost arbitrary number of sub-ranges. Probably the way to deal with this is to simply limit the number of sub-ranges. With say 10 or 15 sub-ranges, the "change" is pretty close to the original range anyway.
Note that every new sub-range, and associated SaD creates an opportunity for perturbation in the answer, because rows may be dropped. The analyst should somehow be made aware of this. "Best practice" should be to make proper FA ranges in the first place, if practical. Maybe the answer is to have an
info
message telling the analyst that the range has been partitioned, and a pointer to the documentation that discusses this.