AsuharietYgvar / AppleNeuralHash2ONNX

Convert Apple NeuralHash model for CSAM Detection to ONNX.
Apache License 2.0
1.54k stars 132 forks source link

Working Collision? #1

Open dxoigmn opened 3 years ago

dxoigmn commented 3 years ago

Can you verify that these two images collide? beagle360 collision

Here's what I see from following your directions:

$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat beagle360.png
59a34eabe31910abfb06f308
$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat collision.png
59a34eabe31910abfb06f308
dxoigmn commented 3 years ago

Given Apple's consistent dishonest conduct on the subject I'm concerned that they'll simply add the examples here to their training set to make sure they fix those, without resolving the fundamental weaknesses of the approach, or that they'll use improvements in the hashing function to obscure the gross recklessness of their whole proposal. I don't want to be complicit in improving a system with such a potential for human rights abuses.

FWIW, the best defense we have against these kind of attacks is adversarial training. Adversarial training is exactly what you describe: construct evasions/collisions, train on them, and repeat. My initial interest in all this was to see whether Apple employed such a training method. It's clear they did not, but I see no reason they couldn't do this for their next version of this model.

SkyVelleity commented 3 years ago

Assuming Apple doesn't renege before the feature is released, I suspect the best way to compel them to do so is to blow it so widely open as soon after launch as possible that they have no choice but to retract the system. Until then, perhaps waiting for their final implementation whilst rallying against it in other ways is the best option.

Still, I haven't seen an answer to what I think may be a very significant question: If the whole point of on-device scanning is to enable E2E iCloud encryption, if Apple's announced they're gonna run a second hash on their own servers, doesn't that entirely invalidate any claimed legitimacy on-device scanning ever had? If Apple has already backpedaled to server-side scanning (originally stating "The system is significantly more privacy-preserving than cloud-based scanning"), why not junk on-device scanning entirely? If my understanding of their change of heart is correct, this could be a very strong argument against the system.

gmaxwell commented 3 years ago

@dxoigmn

FWIW, the best defense we have against these kind of attacks is adversarial training

I'm not sure I agree. One can build a system with fairly strong guarantees against false positives at the expense of false negatives: normalize then downsample and quantize the pixels to the lowest resolution that still leaves images reliably recognizable. Use sha256 on the result. This will give a guaranteed protection against adversarial false positives (assuming the security of sha256). It also won't catch as many fuzzed/recompressed images, though it will catch some. No perceptual hash will strongly protect against evasion by parties willing to modify their images, in any case.

The correct false positive vs false negative tradeoff that is best depends on how much you value users' privacy vs how many images you're willing to miss. I don't think apple should be deploying this system at all, so obviously I also don't think exposing users to false positives to avoid a few false negatives is a good tradeoff. :) I view the use of a complex perceptual hash as a product of decisions being made by people engaging in a one-sided risk assessment that underweight privacy harms and overweight missed images.

The simple cryptographic hash based approach I mention above also has the advantage of being extremely simple to build (or disadvantage if you consider justifying headcount an advantage). If someone really must justify headcount, the simple approach could be improved further e.g. a neural network post processing "error correction" filter that helps quantize pixels consistently, without removing a strong cryptographic protection against adversarial false positive. (e.g. a guarantee that any positive match will have all pixels +/-1 from the true image in a normalized downsample, unless sha256 is broken).

but I see no reason they couldn't do this for their next version of this model.

I expect that extensive adversarial training would make it harder, but still leave construction of preimages possible. In the domain of cryptography it's long been recognized that systems which appear to be secure because their creators attacked them and failed are often not secure at all. Competently constructed cryptosystems are built in ways where security is provable outright or tracable to a small number of well understood assumptions.

Above I describe a scheme for downsampling with "error-corrected" rounding that could give security against adversarial false positives traceable to the security of sha256 against second preimages. It would be less fuzzy, but if you are not valuing avoiding false-negatives over all other considerations, less fuzzy should be an option. :)

Even better could be done if they didn't insist on hiding the database.

@SkyVelleity

if Apple's announced they're gonna run a second hash on their own servers, doesn't that entirely invalidate any claimed legitimacy on-device scanning ever had

The idea is that this second 'secret' hash will be only applied to images that match the first. So arguably there is still some value in having the first to keep user's data hidden from apple absent false positives on the first function. I find their counter unconvincing because: Making it secret protects it much more against review than it does against attackers-- attackers may steal the function (via insider threats) or just be handed it as is the case for the state actors who produce these databases (and many of the concerns here are that state actors may be the attackers-- even if you trust your state no one is safe from a potentially unfriendly regime change in the long run). If the second function has similar structure, it will likely have similar vulnerabilities in any case.

memiux commented 3 years ago

Lotte no Omocha! @AsuharietYgvar Your profile pic led me to the most disturbed plot ever 👀🙈

tomhicks commented 3 years ago

Is there some particular reason that this property is important? I could imagine that doing the search in a multiscale way might result in better looking images... but is there something about the apple setup that works this way?

I was wondering whether true positives would still match, yet false positives would not match.

As in, if there's some micro property of the false image that causes the hash to match that can be destroyed by down/upsampling. The theory being that true positives have an image structure that matches much more closely at varying resolutions.

I don't know much about how the adversarial images are arrived at, but I just wondered if this was a property that wouldn't hold for false matches but would for true ones.

gmaxwell commented 3 years ago

I was wondering whether true positives would still match, yet false positives would not match.

Because of the way mine are constructed I wouldn't be shocked if they survived up and down sampling better that naturally constructed images: I penalize heavily towards making the changes in lower frequency components rather than higher-- probably much more than I should for the best image quality but it was pretty much the first thing I thought of. :)

I'm pretty unfamiliar with the machine learning tools I'm sure someone who knows what they're doing can do a lot better than me if they give it a good try. :)

anishathalye commented 3 years ago

@tomhicks There is a general technique you can use if you intend for an adversarial perturbation to be robust to a particular set of transformations (e.g. down/upsampling). For many types of transformations, the adversarial examples by default won't be robust to that type of transformation (e.g. rotation), but if you know what transformation you want the example to be robust to a priori, you can make it so. At a high level, this can be done by incorporating the transformation into the training process. Suppose you have some set of transformations T you want the example to be robust to, then the optimization problem becomes that you want to find an example x such that hash(t(x)) = target for t ∈ T. I haven't tried this for NeuralHash yet, but here is a demonstration of adversarial examples (against an image classifier) that are robust to rotation.

tomhicks commented 3 years ago

@anishathalye interesting.

That raises two more questions, if you don't mind!

  1. Is there any difference between generating adversarial examples for misclassification as opposed to hash collisions?

  2. Is it generally the case that true matches will be more robust to a wider range of transformations than adversarial matches?

anishathalye commented 3 years ago
  1. Is there any difference between generating adversarial examples for misclassification as opposed to hash collisions?

It is the same concept at a high level: find an input to a neural network that produces a desired output (and perhaps satisfies some extra properties, such as "looks similar to a particular image" or "doesn't look like glitch art"). There has been a lot of research in the area of adversarial examples (thousands of research papers), and researchers have demonstrated adversarial examples on all sorts of neural networks, like image classification, object detection, semantic segmentation, speech-to-text, and many others (including weird things like style transfer that may not make sense to "attack" from a security perspective).

Informally, I did find it harder to attack the neural hash algorithm compared to standard image classifiers like those trained on ImageNet. It took some amount of parameter tuning and such to find collisions for NeuralHash fairly reliably, and even then, I don't think the results look fantastic. Whereas for standard image classifiers, you can basically do whatever attack (like a single step of gradient descent) and they fall over right away; and it doesn't even take much work to get adversarial examples with imperceptibly small perturbations.

Here's one informal / intuitive explanation for why this may be the case: for NeuralHash collisions, we're trying to "encode" 96 bits of information into an image, whereas for attacking an ImageNet classifier, we're only trying to encode log2(1000) ≈ 10 bits of information. (Though of course, this is not a formal proof, nor is it even a fully satisfying intuitive explanation. The semantic segmentation or style transfer adversarial examples, for example, encode more than 96 bits of information into the image.)

Apple might have also done adversarial training (a standard technique to make networks more robust to adversarial examples); I don't remember off the top of my head if there is research on identifying whether a given trained model has been trained using such techniques (and afaik Apple hasn't published details on how NeuralHash was trained).

  1. Is it generally the case that true matches will be more robust to a wider range of transformations than adversarial matches?

In my experience, yes, this is generally true. Unless the adversarial examples have been made robust to the particular kind of transformation being considered. With the example above, of the adversarial example robust to rotation, I wouldn't be surprised if the adversarial example is more robust to rotation than the original image.

tomhicks commented 3 years ago

Thanks for that.

As far as I know, Apple's only details on how NeuralHash was trained are broadly: shown similar images and told to match, shown dissimilar images and told to not match. You would hope there's some adversarial training in there.

Although wouldn't that just push the problem to a different type of adversarial images?

With the example above, of the adversarial example robust to rotation, I wouldn't be surprised if the adversarial example is more robust to rotation than the original image.

Interesting. That would be interesting to see.

LiEnby commented 3 years ago

dont forget the UK Website Blocklist which was created to "stop child porn" and now is exclusively used to block sites like ThePirateBay 1337x and such https://en.wikipedia.org/wiki/List_of_websites_blocked_in_the_United_Kingdom https://en.wikipedia.org/wiki/Web_blocking_in_the_United_Kingdom#History

Why isnt DNSSec standard already?

dxoigmn commented 3 years ago

@gmaxwell

I expect that extensive adversarial training would make it harder, but still leave construction of preimages possible. In the domain of cryptography it's long been recognized that systems which appear to be secure because their creators attacked them and failed are often not secure at all. Competently constructed cryptosystems are built in ways where security is provable outright or tracable to a small number of well understood assumptions.

The point of collision resistance is to make it computationally harder, not impossible. I wish we had a proof of one-way functions, but we don't. The collision resistance of SHA256 (and friends) rely upon a similar methodology as this model: just a small set of researchers trying to understand the underlying algorithm and find more efficient methods of finding collisions.

gmaxwell commented 3 years ago

By possible I meant practically possible, though thanks for clarifying for those for whom it wasn't clear.

For cryptographic hashes we have decades of experience, powerful tools, mountains of research. Moreover, for a neutral network to be trainable it must be significantly piecewise differentiable and locally linearly approximatable-- and avoiding that kind of behavior is an explicit requirement in the design of transformations in symmetric crypto. Much of the analysis and review in symmetric cryptography goes into proving how free the function is from those behaviors. :) So I don't think it's at all reasonable to draw a parallel here: Using a neural network as a hash is more like the state of symmetric crypto 40 years ago, but with an added problem that it must have properties that we know make security hard.

jankais3r commented 3 years ago

Apple claims that they have a second hashing algorithm running server-side to prevent false positives. That second algorithm could very well be PhotoDNA. Who's up for the challenge to find an image pair where both NeuralHash and PhotoDNA collide?

You can grab PhotoDNA here: https://github.com/jankais3r/jPhotoDNA

tomhicks commented 3 years ago

Using a neural network as a hash is more like the state of symmetric crypto 40 years ago, but with an added problem that it must have properties that we know make security hard.

No-one is using these hashes to secure communications between two parties, though which I think is an important distinction. This doesn't make the system "insecure" in any way.

By generating a collision, all you enable is for Apple to look at the photo that you supplied.

This makes the system marginally less useful to Apple for avoiding server work.

jankais3r commented 3 years ago

By generating a collision, all you enable is for Apple to look at the photo that you supplied.

This makes the system marginally less useful to Apple for avoiding server work.

Except that instead of Apple employees looking at the flagged photos, totalitarian governments could easily force themselves into this manual review process. Then all they need to do is to disseminate pro-democracy images that collide with actual CSAM images in Apple's database and the system will flag dissidents for them.

tomhicks commented 3 years ago

Why not just target the server? You're gonna need to do that anyway so you might as well just do that.

gmaxwell commented 3 years ago

@tomhicks The system that apple proposes leaks the cryptographic keys to decrypt photos conditional on the user being in possession of a threshold of neuralhash matching images.

In the attack @jankais3r describes an authoritarian regime compromises apple through hacks, leaks, moles, bribes, or a National Security letter. They then either distribute to the public pro-freedom images modified to match the neuralhashes of known child porn OR they take child porn and modify it to match circulated pro-freedom messages and submit the child porn to the agencies that build the lookup databases. Your computing device processes your private files and leaks to apple the decryption keys, demonstrating your possession of the targeted political speech and allowing the attacker to begin more detailed tracking --- or just send out a group to execute you.

An analogous attack could also be performed without attacking the neuralhash: just put the ideologically identifying images in the database directly. Because the comparison is performed using a private set intersection, no one can directly tell what the database contains. However, Apple has made unfalsifiable claims of various self-enforced adhoc protection mechanisms which should make this attack harder: E.g. they claim their database is an intersection of databases from multiple governments, so absent a neuralhash attack the governments would need to collude to include non-child-abuse images, similar to how 'five eyes' countries spy on each others citizens to evade the hard legal prohibitions against spying on their own. Targeted false neuralhash matches undermine those protections.

Moreover, the potential for adversarial collisions means that if the plain-text of the database is leaked or otherwise made public, or just shared with third party reviewers, any inappropriately included material is potentially deniable due to the potential of making false matches.

This is particularly unfortunate, because there are simple and obvious schemes which, assuming the database has no false entries in it, make false positives computationally intractable with extremely high confidence and would eliminate the potential for the above attacks in a way which is independently verifiable. They have the disadvantage of having more false negatives.

Since detection here requires the target running software that betrays their self-interest and especially since it's extremely easy to change images so their neuralhashes don't match any such system is already only going to be effective against people who are not trying to evade it. Yet at no point has Apple attempted to justify the use of a function which has a meaningful amount of false positives over ones that a FP free for practical purposes. But why would they without people showing that false positives are a potential issue through attacks on the hash function like the ones people have presented in this issue?

fuomag9 commented 2 years ago

that's nice info overall. but its look like fake. check it out here at paklix.com

This is spam, @amir32 should be banned Edit: the profile pic is also AI generated