gda-score / code

Tools for generating General Data Anonymity Scores (www.gda-score.org)
MIT License
7 stars 1 forks source link

some problem with inference claims #54

Open yoid2000 opened 4 years ago

yoid2000 commented 4 years ago

I received the following from an attacker:


If we understand it correctly, GDA evaluates inference claims by verifying that all users in the subset given by the attacker have the same value for a certain column. However, the actual value of the column appears to be irrelevant. For example, an attacker who guesses that a certain set of users all have gender 'Male' will have their inference claim evaluated as correct even if all users in the set have gender 'Female'. That's what happens in this example:

image

which produces:

image

Note that one consequence is any inference claim about a single user is evaluated as correct.


And my answer:

As for inference, that looks like a bug in the code. An inference attack with one user should be identical to a singling out attack. I'll make an issue.

Could you look into this problem?

frzmohammadali commented 4 years ago

@yoid2000 Hi Prof. Paul,

I guess I have some observation with regard to this issue. let me share and check it with you.

as for checking the correctness of an inference attach, this internal method gdaAttack._checkInference(...) gets called. then what happens there, for each column, this method finds the value of the first row, and checks it against other rows. therefore I think the fact that we are ignoring the value that attacker has guessed at this point, could be the reason why claiming male for all female records returns 'correct'. so this is what we have:

...
class gdaAttack:
    ...
    def getClaim(self):
        ...
        job = self._claimQ.get()
        ...
        elif self._cr == 'inference':
            claimIsCorrect = self._checkInference(reply['answer'])
        ...

    def _checkInference(self, ans):
        ...
        for c in range(1, numColumns):
            val = ans[0][c] ### this is choosing the value of first row ###
            for r in range(1, numRows): ### and skiping first row from now ###
                if val != ans[r][c]:
                    return 0
        return 1

    ...   

my suggestion: we also send spec to this method as a parameter, and check value against what attacker provided like so:

...
class gdaAttack:
    ...
    def getClaim(self):
        ...
        job = self._claimQ.get()
        ...
        elif self._cr == 'inference':
            spec = job['spec']     
            claimIsCorrect = self._checkInference(reply['answer'], spec)
        ...

    def _checkInference(self, ans, spec):
        ...
        for c in range(1, numColumns):
            val = spec['guess'][c - 1]['val'] ### getting anticipated value by attacker ###
            for r in range(0, numRows): ### check against all rows ###
                if val != ans[r][c]:
                    return 0
        return 1

    ...   

let me know if it sounds correct to you as well or not.

Anyway, there is still one question that I have: based on testInference.py and the example this attacker provided, attacker is claiming for only one column not multiple. in contrast, we used list data structure and loop over it in the code. is it some kind of best practice to claim for only one column per attack, or it's always the case, or it's just the simplest example and never happens in real attacks?

cheers, Mohammadali

yoid2000 commented 4 years ago

aha I see. Suddenly I'm not so sure there is a problem at all. I need to think about this a bit more. Please leave it aside for now.

frzmohammadali commented 4 years ago

Oh OK.