AsuharietYgvar / AppleNeuralHash2ONNX

Convert Apple NeuralHash model for CSAM Detection to ONNX.
Apache License 2.0
1.53k stars 132 forks source link

Working Collision? #1

Open dxoigmn opened 3 years ago

dxoigmn commented 3 years ago

Can you verify that these two images collide? beagle360 collision

Here's what I see from following your directions:

$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat beagle360.png
59a34eabe31910abfb06f308
$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat collision.png
59a34eabe31910abfb06f308
gmaxwell commented 3 years ago

When reporting to NCMEC's CyberTipline, they have a checkbox: have you reviewed it? If you say "no", then NCMEC's staff will review it.

There is currently no such checkbox: https://report.cybertip.org/submit Perhaps they had one in the past prior to adverse rulings in Federal court.

Again, see US v. Miller, if you perform a match based on an NCMEC database and send the image to NCMEC without reviewing, then the user's fourth amendment protection against warrantless searches is preserved, and NCMEC (as an agent of the government with special statutory authority) cannot lawfully inspect the image. The suppression of the user's fourth amendment rights is dependent on a prior review by the provider which is lawful under the color of their private commercial agreement. Apple's human review is a critical step in the suppression of the constitutional rights of their users, without it any subsequent review by NCMEC or law enforcement would be unlawful and the whole reporting process would be useless (except to build databases of kompromat and other such due-process-less results).

LiEnby commented 3 years ago

this seems to be enough to drastically change any images hash: add borders, about 1000px larger than the main image, fill them with as much entropy as you can. Untitled hash: ff0dcf8b9371ebd28a4f5d2d

windows_xp_bliss-wide hash: 9f3bce9b9d716bf399cf4f21

this process is completely lossless since i just added more pixels around the edges of the original image, and in the photo viewer, you can simply zoom in and see the original image anyway, >-<

deftdawg commented 3 years ago

@gmaxwell Can't Apple just take consent from a user's previous agreement to the icloud TOS/EULA? Be interested to hear if that works in a court

Again, see US v. Miller, if you perform a match based on an NCMEC database and send the image to NCMEC without reviewing, then the user's fourth amendment protection against warrantless searches is preserved, and NCMEC (as an agent of the government with special statutory authority) cannot lawfully inspect the image. The suppression of the user's fourth amendment rights is dependent on a prior review by the provider which is lawful under the color of their private commercial agreement. Apple's human review is a critical step in the suppression of the constitutional rights of their users, without it any subsequent review by NCMEC or law enforcement would be unlawful and the whole reporting process would be useless (except to build databases of kompromat and other such due-process-less results).

LiEnby commented 3 years ago

@gmaxwell Can't Apple just take consent from a user's previous agreement to the icloud TOS/EULA? Be interested to hear if that works in a court

obviously it wont because apple plans to use this for more than just iCloud

gmaxwell commented 3 years ago

@deftdawg Yes, Apple's commercial relationship with the target of the search is what makes it lawful for Apple to search the user's otherwise private data. And if Apple staff searches the data and finds something the believe to be unlawful, when they pass it onto an agent of the government Apple's search can lawfully be "repeated" without a warrant. But if Apple didn't review the material, and just continued matching against some government provided database, then when they hand the content over to a government agent a warrant would be required to inspect the image.

In any case, this is getting pretty far afield. The point I was making, which I don't think anyone has seriously disputed, is that ideally an attack would manage to create second-preimage-images which are visually similar to an arbitrarily selected image. A noise image like has been constructed so far is an interesting first step and could potentially cause problems, but it's not as powerful of an attack as an arbitrary benign-looking image which is a hash match against a specific fixed entirely different image.

Edit: E.g. see this HN comment where someone is being distracted by the fact that the current POC example image is random noise: https://news.ycombinator.com/item?id=28222041 I think it's an example for why the public discourse would be improved with second-preimage-images that looked like natural-ish images.

danny-wu commented 3 years ago

@gmaxwell, getting back to the security issues of how this system can be used to attack victims. Given that the network here is neural network based in nature, it seems implicitly possible to also guide collisions towards an arbitrarily selected image.

If there are millions of images in the CSAM database, it would be almost certain that there are close-ups of body genitalia. As CSAM covers everything up to 17 years and 364 days old, and the law treats it the same, then there would be images that are CSAM and illegal, but visually very similar (at least to a human inspecting a visual derivative) to images that are adult and legal.

With this in mind, I propose a different and perhaps more sinister attack:

  1. Attacker A goes on the dark-web and collects known-CSAM, particularly those that are close-ups of genitalia, and do not visibly appear to be 'child porn'; just normal porn. Attacker A gets this list of hashes, and sends it to attacker B.

  2. Attacker B receives a list of 96-bit hashes, and then finds (or produces) legal amateur pornography of 18+ consenting adults that are also similar close-ups of genitalia. Attacker B never works or interacts with CSAM.

  3. Attacker B disturbs the legal close-up images, until they match the neural hashes provided by Attacker A.

  4. The disturbed legal images get sent to the victim's device via some means.

  5. Apple's CSAM detection will then flag a neural hash match, and once the threshold is reached, Apple will be alerted. An Apple employee will then review the 'visual derivative', and they will see body genitalia. At this point, they will have a legal obligation to report and forward this to law enforcement.

  6. Law enforcement, after seeing 30 matches of close-up pornography that perceptually hashes to CSAM, may raid the victim's place, despite the fact that no CSAM material was ever sent or interacted with by Attacker B and the victim.

In this scenario:

hackerfactor commented 3 years ago

There is currently no such checkbox: https://report.cybertip.org/submit Perhaps they had one in the past prior to adverse rulings in Federal court.

Here's a screenshot from the cybertipline's current reporting form. (The checkbox has been there for years...) It's the checkbox that says "File Viewed By Company". ncmec

hackerfactor commented 3 years ago

this seems to be enough to drastically change any images hash: add borders, about 1000px larger than the main image, fill them with as much entropy as you can. ... this process is completely lossless since i just added more pixels around the edges of the original image, and in the photo viewer, you can simply zoom in and see the original image anyway, >-<

You're losing sight of what NeuralHash claims to do. It is NOT a content identification or sub-picture search. It is a "perceptual hash". Think of it this way: If you print out both photos, scaled to the same size, and mount them on a wall 20 feet away, would a human say they look the same?

Your "add a thick border of noise" would be a resounding "no, they look different because one has a big noisy border." And lo and behold, the hashes are very different.

A significant crop? "No, one has been cropped."

Flip? Rotate? "Nope, they look different because one has been flipped and/or rotated."

Try this: Take the dog photo, draw a tiny mustach on him, and check the hash. A minor alteration should result in little or no difference in the hash.

The things to test: How much of a difference must be made to make the hash signifiantly different? (Lots of bits different.) Does the amount of change needed vary based on where in the picture it occurs? How much cropping off any each edge? Is it symmetrical or asymmetrical? Can we associate specific bits in the hash with a specific type of alteration?

Here's a fun one that I'd like to see someone test: Start with the dog. Erase (not crop) the left side. Do half of the bits remain the same?

Picture "AA" has hash "aa". Picture "BB" has hash "bb" If I create a side-by-side splice "AB", do I get the hash "ab"? If so, then we can easily force a hash collision.

MasterNeuron commented 3 years ago

Can you verify that these two images collide? beagle360 collision

Here's what I see from following your directions:

$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat beagle360.png
59a34eabe31910abfb06f308
$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat collision.png
59a34eabe31910abfb06f308

Both white noice images are the same and I can't see any differences?! Somebody is seeing dog on the 1st photo...

kjsman commented 3 years ago

I generated another working collision:

Target image lena 32da5083d4f9db9d45b4c397

Generated image download-60 32da5083c4f9db9d45b4c397

Lena: 32da5083d4f9db9d45b4c397
 Dog: 32da5083c4f9db9d45b4c397
yeldarby commented 3 years ago

@kjsman I ran those images through CLIP as described here and the dog image is still identified by CLIP as generated (but it does see the dog as well as the 3rd most similar word).

Top matching English words according to CLIP for each of those two colliding images (and their feature vector's cosine similarity with the image's):

Lenna:

britney 0.2710131274330054
hat 0.2695037749031152
caroline 0.267320863072384
hats 0.261636163397781
catherine 0.2614189858805191
judy 0.25487058977059807
claire 0.2535490374489525
heather 0.25337776325078754
sapphire 0.25308272433397716
blake 0.2522795140967653

Dog:

generated 0.2695470207125587
regard 0.26522406265702764
dog 0.2647775811672407
shepherd 0.26429458470715683
vincent 0.26314263493135504
lycos 0.26132000000095534
marilyn 0.2608674788806449
lion 0.2595975315789243
face 0.25944818166185624
gnu 0.2579544381392359
gmaxwell commented 3 years ago

@kjsman Awesome! That's closer! Attacks only get better.

@hackerfactor Thanks for the screenshot! it appears that particular form isn't something the public has access to (or at least I can't find it). It's not on the form I linked to. Regardless, the case law is now completely unambiguous. If the NCMEC is getting scanning results from companies who have not actually inspected the image and believe it to be child porn on the basis of their inspection, then the NCMEC's subsequent review is a search by an agent of the government and requires a warrant. It seems really foolish to have that checkbox there, a footgun to let child predators go free. (and as much as I disapprove of the warrantless searching of user's private files, regardless of whos doing it-- I find it really disappointing to find that once the invasion has happened they're not doing everything to make sure charges can actually be filed)

unrealwill commented 3 years ago

lena-output

python3 nnhash.py model.onnx neuralhash_128x96_seed1.dat lena-output.png 
59a34eabe31910abfb06f308

59a34eabe31910abfb06f308 (Dog Hash)
Jigsy1 commented 3 years ago

You are aware that hashes aren't conventional hashes, right? It's just a series of comma separated numbers.

yeldarby commented 3 years ago

@unrealwill that one at least doesn't match generated higher than other word in the dictionary, but CLIP still picks it out quite highly (10th most relevant word in the dictionary):

CLIP feature vector cosine similarity with that image & English dictionary words:

lcd 0.2685186813767101
cover 0.2656788641452026
duration 0.2610448567100487
banner 0.2610146163956426
ebook 0.2607660721001148
pixels 0.2596127825121441
dvd 0.25817185406107845
poster 0.258116692755925
gif 0.25807298663266715
generated 0.25786197122020243

Is it possible to steer it away from CLIP's generated feature vector as you permute from the base?

gmaxwell commented 3 years ago

impose a constraint that would get rid of the the high frequencies in the image. such as an l1 norm in the DCT domain.

Adding an L1 DCT penalty should result in much more natural images generally, not just getting rid of the HF noise. Just reiterating this good suggestion.

unrealwill commented 3 years ago

@yeldarby Is it possible to steer it away from CLIP's generated feature vector as you permute from the base? yes you can steer it away easily, just add some term to the loss function (But obviously you will need to compute CLIP every iteration so it will take more time to converge)

The L1 DCT penalty is probably a good suggestion, but you can probably also use the discriminator of a GAN network to maintain naturalness.

To give an idea of the time complexity : My above picture was generated using a laptop CPU only in less than 5 minutes using non-optimized code. Starting from a given image + basic gradient descent + clipping image to its [-1,1] domain at each iteration (not even lbfgs yet) Optimized code would probably run ~100x faster. So there is plenty of room to add quality.

Sorry for the NSFW, but it was to show that you can produce a soft-porn image of any given hash, that a user can know it's perfectly legal and have legitimate reasons to store, but a manual reviewer will have more problem to know the age of the person involved.

Put yourself in the shoes of the manual reviewer presented with a nude picture with a "matching CSAM " and ask yourself from the grainy picture that looks like a home-made scanned slide whether you can determine if the girl is underage or not and if you should report the picture or not.

Stereo101 commented 3 years ago

I'm convinced you can take almost any image and smoothly transition it to very close to any other image while retaining its original perceptual hash value. The properties of what a perceptual hash needs to do practically guarantee it.

If you think of the space of all possible images of a given height / width, the perceptual hash should divide that space up into clustered continuous chunks where locality is determined by tiny edits. An obvious way to define the space is as one dimension for each color of each pixel with values from 0 to 255. The issue is that as we increase the number of dimensions, these clusters of identical hashes are more forced to stretch to fill that space.

To get some intuition on this: consider as you increase dimensions, the "volume" of an n-dimensional sphere divided by the "volume" of the corresponding n-dimensional cube approaches 0. This is concerning because the sphere is the best possible packing you can get, in other words the ability of well packed objects to fill space gets worse as dimensions go up. Consider that even a tiny 360 by 360 image would represent a huge 388,800 dimensional space. Instead of getting nice clusters, we're getting a kind of interwoven series of sponges for each hash value which completely fills the space, which is admittedly hard to visualize.

Not being able to tightly pack these hash values together defeats the main goals of the perceptual hash (or any perceptual hash?) to have similar images give similar values. I'm about a third of the way done with turning a picture of samwise into a bowl of frootloops with the identical perceptual hash.

mcdallas commented 3 years ago

@unrealwill I used the code in your gist and swapped the actual model and hashing function but got nowhere near the performance described. Any chance you could share the updated code?

yeldarby commented 3 years ago

@yeldarby Is it possible to steer it away from CLIP's generated feature vector as you permute from the base? yes you can steer it away easily, just add some term to the loss function (But obviously you will need to compute CLIP every iteration so it will take more time to converge)

@unrealwill can you maintain the same NeuralHash while doing so though?

anishathalye commented 3 years ago

@mcdallas Here is an example implementation that I hacked together in a couple hours: https://github.com/anishathalye/neural-hash-collider It doesn't have any particularly fancy features, just uses vanilla gradient descent, and the loss currently doesn't have a term for distance from starting image, but it seems to work all right on some examples I've tried so far. It seems a bit sensitive to some parameter choices.

For example, starting from this cat picture I found online, the program finds a mutation that collides with the dog in the original post above, taking about 2.5 minutes on a 2015-era CPU:

x

$ python3 nnhash.py adv.png
59a34eabe31910abfb06f308
thecadams commented 3 years ago

It's all well and good for apple to run images through CLIP and discard the generated ones, but consider @KuromeSan's suggestion:

add borders, about 1000px larger than the main image, fill them with as much entropy as you can.

CLIP would surely identify this modified image as generated.

gmaxwell commented 3 years ago

@thecadams It's absolutely trivial for someone with interdicted images to reliably evade the hash match with trivial modifications, even simpler and less visually obnoxious than that. So their threat model must not assume that the child porn peepers are actively trying to evade detection.

Given that, one might wonder what they used this vulnerable ML hash instead of just sha256ing a downsampled and quantized copy of the decompressed image-- getting a slight amount of robustness to scaling/recompression but avoiding any real risk of generating second pre-images. It's hard to resist be uncharitable and suggest that someone might have been justifying headcount for a low value high buzzword factor enhancement that ended up adding a vulnerability.

SkyVelleity commented 3 years ago

Yes! I can confirm that both images generate the exact same hashes on my iPhone.

Sorry if I've missed something (I'm a bit of a brainlet), but this makes it sound like you're hashing on your iPhone somehow, rather than using the extracted model on a computer. Is there currently any knows way of generating hashes directly on a (presumably jailbroken) iPhone using Apple's own implementation? Sorry if this is a dumb question, but I'd be interested in tinkering around with the device side hashing implementation if it's already there somewhere.

ProfFan commented 3 years ago

Yes! I can confirm that both images generate the exact same hashes on my iPhone.

Sorry if I've missed something (I'm a bit of a brainlet), but this makes it sound like you're hashing on your iPhone somehow, rather than using the extracted model on a computer. Is there currently any knows way of generating hashes directly on a (presumably jailbroken) iPhone using Apple's own implementation? Sorry if this is a dumb question, but I'd be interested in tinkering around with the device side hashing implementation if it's already there somewhere.

https://github.com/KhaosT/nhcalc

You can port the code to iOS, it's trivial.

thepwrtank18 commented 3 years ago

"check this cute dog photo, set it as your wallpaper" "ok" 5 hours later "why am I in jail"

really hope the above doesn't happen and Apple can actually moderate

unrealwill commented 3 years ago

@mcdallas I used the code in your gist and swapped the actual model and hashing function but got nowhere near the performance described. Any chance you could share the updated code?

Your wish is my command :

https://gist.github.com/unrealwill/d64d653a7626b825ef332aa3b2aa1a43

Above is a script to generate collisions. It has some additional amelioration with regard to my previous more general script.

Most notably Sci-py optimizer, applying to the real weights, and an optional alternative dual loss function.

Some images may take longer to converge, but other can be really quick.
It takes between 10 sec and 10 minutes using CPU only.

dangerousones commented 3 years ago

Jesus

DominikDeak commented 3 years ago

"check this cute dog photo, set it as your wallpaper" "ok" 5 hours later "why am I in jail"

really hope the above doesn't happen and Apple can actually moderate

I would not trust my personal freedom to be moderated by a corporate entity.

thepwrtank18 commented 3 years ago

"check this cute dog photo, set it as your wallpaper" "ok" 5 hours later "why am I in jail" really hope the above doesn't happen and Apple can actually moderate

I would not trust my personal freedom to be moderated by a corporate entity.

This. This entire thing shouldn't happen in the first place.

dud1337 commented 3 years ago

"check this cute dog photo, set it as your wallpaper" "ok" 5 hours later "why am I in jail" really hope the above doesn't happen and Apple can actually moderate

Oh good heavens. If you would bother to read all of the documentation that Apple has produced about the system, then you would know that a) these generated collisions are useless because you need the hashes to generate them and the CSAM hashes aren't available to users and b) flagged accounts are moderated.

In discussion earlier, someone devious pointed out it would be relatively easy to find sample CSAM on a low-level scumbag fest via your local dorknet. One common enough it is probably known by the fuzz, then get the hash from that.

gmaxwell commented 3 years ago

Hi all. I can now generate second-preimage images that don't look like a glitch-art horror show.

They still aren't perfect, but I'm totally new to all these tools and only spent a bit over an hour on it. It took me more time trying to get onnx/python/tensorflow working than I spent breaking the hash, so I didn't get around to trying anything too fancy. I found essentially everything I tried more or less worked, so I'm confident that with some more intelligence much better can be done.

The way I got this result is to apply the same iteration as @anishathalye: taking the derivative of the apple hash function and applying it to the image, but I also perform noise shaping by using a gaussian high-pass filter on the error signal relative to the original image and then feeding that back in. The appearance is pretty sensitive to the feedback schedule so I twiddled until I got a good result. In this case I started with an initially high feedback decreasing hyperbolicly and then once it hit a hash hamming distance of zero I had it switch to a control loop to try to exponentially increase feedback while the distance was zero and decrease feedback while it was above zero. Then I picked the diff=0 image that had the lower PSNR vs the input (which wasn't the last one, because none of my objectives were MSE).

Maybe now that I've shown it can be done, someone who knows these tools will do it better. I'm sure bolting on a proper optimizer would at least make it require less fiddling to get an okay result, and something smarter than a dumb high-pass filter will assuredly get better result even with this simple noise shaping approach. The next noise-feedback I would want to try is to apply an 8x8 jpeg-like dct on the noise, quantize it and round down the coefficients, and feed that back in.

image

$ python3 ./nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat /tmp/a.png
59a34eebe31910abfb06f308

(matches the dog at the top of the thread.)

Input image:

image

Feel free to use this image in further examples. I was tired of unimaginative people who thought images of dogs were an irrelevant consideration, and my partner volunteered a mid-80s picture of herself.

unrealwill commented 3 years ago

@gmaxwell my version (lost in the thread above) ^^ has scipy optimizer lbfgs-b https://gist.github.com/unrealwill/d64d653a7626b825ef332aa3b2aa1a43 working, It should help you get familiarized with the tooling.

yeldarby commented 3 years ago

@gmaxwell - that version appears genuine to CLIP. Similarity to generated is only 0.19051728 (top matches: costume, britney, dress, hs, costumes).

But the cosine similarity between your image and the original dog photo is only 0.4458430417488104 (compared to 0.7915248896442771 for this arbitrary image of a dog I pulled from Google).

If their server-side model truly is independent and has a different hash database, you'd need to create a single adversarial image that fools both. Not sure how easy this is; has anyone tried it or seen any studies?

hackerfactor commented 3 years ago

Feel free to use this image in further examples. I was tired of unimaginative people who thought images of dogs were an irrelevant consideration, and my partner volunteered a mid-80s picture of herself.

Be careful: your partner's childhood picture may become the new Lena.

yeldarby commented 3 years ago

I found two pairs of naturally occurring collisions in the ImageNet dataset. I created a repo to track these organic NeuralHash collisions (please do not submit adversarially generated pairs to this repo; only examples found in the wild).

I wrote up the findings here as well: https://blog.roboflow.com/nerualhash-collision/

dxoigmn commented 3 years ago

Has anyone tried creating a universal perturbation? That is, a perturbation that when be added to a non-trivial amount of images causes this model to hash them to the same value? That would be an interesting datapoint to have from a robustness perspective.

The other interesting data point would be an image that hashes to an some interesting value. I had initially tried generating an image that would hash to the zero vector, but had trouble flipping some of the bits. I also tried things like 0xdeadc0de and 0xbaadbeef with no luck. Would be interesting to understand this! I imagine seed0 may have something to do with this, but I haven't really given it much analysis.

gmaxwell commented 3 years ago

Be careful: your partner's childhood picture may become the new Lena.

I'm not concerned. It's obviously a bit different. In Apple's case... literally: these images have a single bit difference in their neuralhashes:

image image

$ ./nnhash.py model.onnx neuralhash_128x96_seed1.dat 130190095-1c83424f-fb50-4586-a9bf-f659d8094250.png
32dac883f7b91bbf45a48696
$ ./nnhash.py model.onnx neuralhash_128x96_seed1.dat 130190121-5f452c06-e9a8-4290-9b8f-d300ead94c13.png
32dac883f7b91bbf45a48296

but had trouble flipping some of the bits.

Indeed, they're far from uniformly easy to flip. From my experience producing collisions, I expect that no single universal perturbation exists, but suspect there are small numbers of distinct clusters of images where its easier to produce collisions between them (or apply a specific adversarial perturbation for them), but harder across groups. I say this based on the fact that there are some image pairs that I've found it extremely easy to produce collisions between them, and other pairs where I can get them as close a 1 bit different but not collide (as above).

If I were trying to produce many adversarial images I'd want to try to be flexible with my starting images.

gmaxwell commented 3 years ago

I found that if I sample a region around the current working image and average the gradients before taking an update, in addition to the noise shaping, I get a less noisy and better looking result. The results also appear to be more robust-- they keep their hashes better even with compression noise etc.

So I updated my earlier example preimage of the dog hash, the girl/lena near collision was already using these techniques (which is why those looked better).

One thing I thought I should make clear: I created this image without ever using the dog image (or even downloading it, until I fetched it for comparison after the fact). I used only dog image's hash (and the original picture of the girl, of course).

image image

$ ./nnhash.py model.onnx neuralhash_128x96_seed1.dat 129860794-e7eb0132-d929-4c9d-b92e-4e4faba9e849.png
59a34eabe31910abfb06f308
$ ./nnhash.py model.onnx neuralhash_128x96_seed1.dat 130296784-bdf329d0-06f6-454e-ba87-d83b010614a7.png
59a34eabe31910abfb06f308
GiverofMemory commented 3 years ago

This is not only a collision, it is a pre-image, which breaks the algorithm even more.

Collison: Find two random images with the same hash.

Pre-image: Find an image with the same hash as a known, given image.

I don't see how that differs at all. Basically to match two random image hashes you have to have both as known given images before hashing them.

GiverofMemory commented 3 years ago

I'm convinced you can take almost any image and smoothly transition it to very close to any other image while retaining its original perceptual hash value. The properties of what a perceptual hash needs to do practically guarantee it.

If you think of the space of all possible images of a given height / width, the perceptual hash should divide that space up into clustered continuous chunks where locality is determined by tiny edits. An obvious way to define the space is as one dimension for each color of each pixel with values from 0 to 255. The issue is that as we increase the number of dimensions, these clusters of identical hashes are more forced to stretch to fill that space.

To get some intuition on this: consider as you increase dimensions, the "volume" of an n-dimensional sphere divided by the "volume" of the corresponding n-dimensional cube approaches 0. This is concerning because the sphere is the best possible packing you can get, in other words the ability of well packed objects to fill space gets worse as dimensions go up. Consider that even a tiny 360 by 360 image would represent a huge 388,800 dimensional space. Instead of getting nice clusters, we're getting a kind of interwoven series of sponges for each hash value which completely fills the space, which is admittedly hard to visualize.

Not being able to tightly pack these hash values together defeats the main goals of the perceptual hash (or any perceptual hash?) to have similar images give similar values. I'm about a third of the way done with turning a picture of samwise into a bowl of frootloops with the identical perceptual hash.

So one attack vector for a Perp of Mass Deception (PMD) would be to create actual csam then create an identical hashing image that looks like something totally different and spread that second image widely to create an infinite stream of false positives. Basically if this is done for each csam image produced it renders this detection method totally moot.

And... looks like it was done https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1#issuecomment-902977931 , https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1#issuecomment-901518138 ,https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1#issuecomment-901769661

Actually according to that second linked post above, a typical computer could produce dozens of random matching images per hour. So you could easily create a dozen false positive images for every csam image you produce, making this detection method by apple totally useless and will cost apple tons of money for hiring people looking for matches.

Another attack vector is the PMD modulates thier CSAM images multiple times so each csam image has multiple hashes. Another way to throw a wrench in this detection method making everyones jobs much harder. Each image could have infinite perturbations making the database balloon to an impossibly large size from just perturbations of the same CSAM image. Also the hashes stored on the apple device are much larger than the actual neural hashes and so would take up infinite space on each device.

gmaxwell commented 3 years ago

This is not only a collision, it is a pre-image, which breaks the algorithm even more. Collison: Find two random images with the same hash. Pre-image: Find an image with the same hash as a known, given image.

I don't see how that differs at all. Basically to match two random image hashes you have to have both as known given images before hashing them.

Because this 'hash' function is entirely differentiable you can easily modify existing images to match an arbitrary hash without ever having seen the other image that the arbitrary hash matches yourself. There may not even be another image that exists.

Collisions modify both images (I gave an example with the lena/girl pair above), so they're fundamentally easier to compute. Not only do you have to have access to both images you have to modify them. If only collisions were possible than I couldn't make false matches against their CSAM database except by distributing images that get included with the database. Since preimages are possible, I can construct matches if someone gives me a list of hashes which are likely to be in it.

So your disruptive attacker that distributes images all over the place doesn't need to create or distribute abuse images, though the attack you describe would also work.

Perhaps a better version is this: Imagine you are an authoritarian regime that dislikes certain ideological factions or ethniticities. You identify images which may be possessed by participants in those groups. You then obtain child porn images (very easy as an authoritarian regime, you even command the creation of new images if it'll work better) and then modify the child porn images to have hashes matching the political images that you're targeting. You place these hashes in your child abuse image database and also forward on the abuse images to other governments and NGOs and they will enter them into their databases with plausible deniability (or not even knowing at all), mooting any requirement that multiple reporting groups must have the hashes in question.

This attack can still work even when there is a second non-public perceptual hash, assuming that it's also of a flawed design like Apple's published hash function: as a state actor you have access to this second hash function (as you needed to generate the hashes, as parties are not sending child porn to apple). So you can generate images which match with both functions.

GiverofMemory commented 3 years ago

This is not only a collision, it is a pre-image, which breaks the algorithm even more. Collison: Find two random images with the same hash. Pre-image: Find an image with the same hash as a known, given image.

I don't see how that differs at all. Basically to match two random image hashes you have to have both as known given images before hashing them.

Because this 'hash' function is entirely differentiable you can easily modify existing images to match an arbitrary hash without ever having seen the other image that the arbitrary hash matches yourself. There may not even be another image that exists.

Collisions modify both images (I gave an example with the lena/girl pair above), so they're fundamentally easier to compute. Not only do you have to have access to both images you have to modify them. If only collisions were possible than I couldn't make false matches against their CSAM database except by distributing images that get included with the database. Since preimages are possible, I can construct matches if someone gives me a list of hashes which are likely to be in it.

So your disruptive attacker that distributes images all over the place doesn't need to create or distribute abuse images, though the attack you describe would also work.

Perhaps a better version is this: Imagine you are an authoritarian regime that dislikes certain ideological factions or ethniticities. You identify images which may be possessed by participants in those groups. You then obtain child porn images (very easy as an authoritarian regime, you even command the creation of new images if it'll work better) and then modify the child porn images to have hashes matching the political images that you're targeting. You place these hashes in your child abuse image database and also forward on the abuse images to other governments and NGOs and they will enter them into their databases with plausible deniability (or not even knowing at all), mooting any requirement that multiple reporting groups must have the hashes in question.

This attack can still work even when there is a second non-public perceptual hash, assuming that it's also of a flawed design like Apple's published hash function: as a state actor you have access to this second hash function (as you needed to generate the hashes, as parties are not sending child porn to apple). So you can generate images which match with both functions.

That doesn't change the fact that it is just as easy to "find" two images that match hashes as to "find" an image whose hash matches the hash of a known image. As people have shown above it takes just minutes to do either.

Perhaps a better version is this: Imagine you are an authoritarian regime that dislikes certain ideological factions or ethniticities. You identify images which may be possessed by participants in those groups. You then obtain child porn images (very easy as an authoritarian regime, you even command the creation of new images if it'll work better) and then modify the child porn images to have hashes matching the political images that you're targeting.

That is a very strong attack of a government on the people. If nothing other than to get people on a government database like america's version of a "suspected terrorist" list.

gmaxwell commented 3 years ago

That doesn't change the fact that it is just as easy to "find" two images that match hashes as to "find" an image whose hash matches the hash of a known image. As people have shown above it takes just minutes to do either.

It's easier to find two images that match than it is to find an image that hashes to a known hash, even in this case. But this is always the case: even for a black box arbitrary hash function finding a collision needs sqrt() the amount of work to find a preimage, even with a totally generic attack, due to the birthday paradox.

Both attacks (preimage and collision) are easy with this hash function because it has a fundamentally weak design. For stronger hash functions like, say md5, one can now compute collisions easily but a second preimage is still unachievable. The poster you were responding to was emphasizing the point that the attack was a preimage attack rather than a collision because we all expect collision attacks to be much easier, but they're less useful for the attacker because the attacker has to influence both inputs, rather than just one.

GiverofMemory commented 3 years ago

It's easier to find two images that match than it is to find an image that hashes to a known hash, even in this case.

I just don't see it because one hash always has to be held constant when comparing it to another. You can vary both, or you can vary one twice as many times.

It depends on the type of attack, and when it is brute force - just varying inputs and comparing outputs, you can vary one or both for the same cost.

gmaxwell commented 3 years ago

I just don't see it because one hash always has to be held constant when comparing it to another. You can vary both, or you can vary one twice as many times.

It's a surprising fact. The reason that in a generic attack a collision is fundamentally easier is because the number of things you match against increases.

Imagine that you want to find some x such that H(x) matches either H(y) or H(z) . Clearly that's easier than finding something that matches H(y) alone because you get two chances instead of one: it takes half as many comparisons on average.

So when you want to find x,y such that H(x)==H(y) you can alternate between varying x and varying y, and each time you compare against all the previously computed values, not just the most recent. Each operation the chance of success increases because you're comparing with more and more alternatives. This takes a sqrt() of the amount of operations as only changing one input. This is due to the same mathematical fact that makes it so a room with a surprisingly few number of people is likely to have at least two people sharing a birthday: When there are 23 people the odds of two sharing the same birthday is >50% even though there are 365 days in a year.

There are fancy algorithms that eliminate the storage and search costs for comparing all the prior hashes, by using a little more computation. A good place to read about memory efficient collision algorithms is the wikipedia article on cycle detection-- because if you choose the next value to try based on the hash of the prior value, finding a collision is equivalent to finding a cycle in the graph created by feeding the output of the hash back into itself.

That applies to generic attacks that work even without any knowledge of the internal structure. With the apple hash function the hash is differentiable, so we're not limited to generic attacks. In this case, it's easier to find a collision because we're actually not limited to updating one at a time. Instead, we can construct a function that computes the difference between two images hashes and use autodiff to find the partial derivatives of the difference output with respect to all of the pixels at once and modify both images at the same step. The extra dimensions gives the attack a lot more degrees of freedom.

gmaxwell commented 3 years ago

Here are some more good looking examples: (prior examples here)

Altering Lena to match barbara:

image image

$ .nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 130310372-d6b2c633-c5d3-4b3e-bc12-d3b2f3104ec6.png
a426dae78cc63799d01adc32
$ .nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 130310383-9fcc80ba-117e-4383-9177-0df24bc99f9a.png
a426dae78cc63799d01adc32
tomhicks commented 3 years ago

Hey @gmaxwell

What happens if you resize the image down to say 1/4 then upsample back to the original size, then run the hash again? Do they still collide?

gmaxwell commented 3 years ago

What happens if you resize the image down to say 1/4 then upsample back to the original size, then run the hash again? Do they still collide?

They're still fairly close, e.g. using the lena/barbara above:

$ convert -scale 90x90 130310372-d6b2c633-c5d3-4b3e-bc12-d3b2f3104ec6.png x.png ; convert -scale 360x360 x.png 1.png
$ convert -scale 90x90 130310383-9fcc80ba-117e-4383-9177-0df24bc99f9a.png x.png ; convert -scale 360x360 x.png 2.png
$ ./nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 1.png
a426dae78cc23399501acc32
$ ./nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 2.png
a42692e78ce22b99d11ace32

I suspect the property could be met by making the search work that way. E.g. stick an upsampler on the computation graph for the model so that it works with a 90x90 image, and do the search there. Then upsample it and use it as a starting point for the full res search (or jointly test the full res and down/up path).

Is there some particular reason that this property is important? I could imagine that doing the search in a multiscale way might result in better looking images... but is there something about the apple setup that works this way?

gmaxwell commented 3 years ago

I won't be posting any more preimages for the moment. I've come to learn that Apple has begun responding to this issue by telling journalists that they will deploy a different version of the hash function.

Given Apple's consistent dishonest conduct on the subject I'm concerned that they'll simply add the examples here to their training set to make sure they fix those, without resolving the fundamental weaknesses of the approach, or that they'll use improvements in the hashing function to obscure the gross recklessness of their whole proposal. I don't want to be complicit in improving a system with such a potential for human rights abuses.

I'd like to encourage people to read some of my posts on the Apple proposal to scan user's data which were made prior to the hash function being available. I'm doubtful they'll meaningfully fix the hash function-- this entire approach is flawed-- but even if they do, it hardly improves the ethics of the system at all. In my view the gross vulnerability of the hash function is mostly relevant because it speaks to a pattern of incompetence and a failure to adequately consider attacks and their consequences.

And this post written after: