AsuharietYgvar / AppleNeuralHash2ONNX

Convert Apple NeuralHash model for CSAM Detection to ONNX.
Apache License 2.0
1.53k stars 131 forks source link

Working Collision? #1

Open dxoigmn opened 2 years ago

dxoigmn commented 2 years ago

Can you verify that these two images collide? beagle360 collision

Here's what I see from following your directions:

$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat beagle360.png
59a34eabe31910abfb06f308
$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat collision.png
59a34eabe31910abfb06f308
AsuharietYgvar commented 2 years ago

Yes! I can confirm that both images generate the exact same hashes on my iPhone. And they are identical to what you generated here.

prusnak commented 2 years ago

@dxoigmn Can you generate an image for any given hash (preimage attack) or do you need access to the source image first (second preimage attack)?

fuomag9 commented 2 years ago

@dxoigmn Can you generate an image for any given hash (preimage attack) or do you need access to the source image first (second preimage attack)?

If I'm not mistaken a preimage attack was done here Edit: this should be the working script (I haven't tested it)

tmechen commented 2 years ago

If I'm not mistaken a preimage attack was done here

hmmm "This is so fake it's not even funny. These are just images generated by the model from https://thisartworkdoesnotexist.com . It's hilarious to see so many people falling for it here" (in the comments)

lericson commented 2 years ago

@fuomag9 Interesting, but one ethical question remains -- how did they obtain the CSAM hashes? I was under the impression that the NeuralHash outputs of the NCMEC images were not readily available. This strongly suggests that the authors must have obtained child pornography, hashed it, and then generated spoofs.

lericson commented 2 years ago

"This is so fake it's not even funny. These are just images generated by the model from https://thisartworkdoesnotexist.com . It's hilarious to see so many people falling for it here"

Or they used images generated by that site as the starting point. As I noted in my previous comment, it's impossible to know without having the NeuralHash NCMEC database.

erlenmayr commented 2 years ago

This is not only a collision, it is a pre-image, which breaks the algorithm even more.

Collison: Find two random images with the same hash.

Pre-image: Find an image with the same hash as a known, given image.

fuomag9 commented 2 years ago

@fuomag9 Interesting, but one ethical question remains -- how did they obtain the CSAM hashes? I was under the impression that the NeuralHash outputs of the NCMEC images were not readily available. This strongly suggests that the authors must have obtained child pornography, hashed it, and then generated spoofs.

If I'm not mistaken the DB with the hashes is stored locally and you can extract it from the iOS 15 beta

Edit: the hashes are stored locally but not in a way that makes them recoverable to the end user, see below

nicolaskopp commented 2 years ago

Holy Shit.

CRTified commented 2 years ago

@erlenmayr No, this is likely a second preimage attack. In a preimage attack you're just given the hash, not the image.

gillescoolen commented 2 years ago

@fuomag9 so the list of known CP hashes is shipped on every device? Isn't this a huge security issue? This was untrue. See this comment.

fuomag9 commented 2 years ago

@fuomag9 so the list of known CP hashes is shipped on every device? Isn't this a huge security issue?

This is from apple's pdf on the technical details of their implementation. Feel free to correct me but from my understanding the blinded hash is the CSAM hash DB

image

Edit: I was wrong and we cannot extract them

gillescoolen commented 2 years ago

Wow. Couldn't they have hashed the image locally, send it to the server and compare it there?

image My bad. The hashes on the device have gone through a blinding process, so my train of thought was not correct. Does anyone know why they chose to compare it on the client instead of on the server, seeing as it is only ran on images uploaded to iCloud?

fuomag9 commented 2 years ago

https://twitter.com/angelwolf71885/status/1427922279778881538?s=20 this seems iteresting tho, a CSAM sample hash seems to exist?

dxoigmn commented 2 years ago

@dxoigmn Can you generate an image for any given hash (preimage attack) or do you need access to the source image first (second preimage attack)?

Not sure. My sense is that seed1 seems to be mixing bits from the output of model. But we already know from the literature it is very likely. At least, I am reasonably confident one could generate a noisy gray image that outputs some desired hash value.

@fuomag9 Interesting, but one ethical question remains -- how did they obtain the CSAM hashes? I was under the impression that the NeuralHash outputs of the NCMEC images were not readily available. This strongly suggests that the authors must have obtained child pornography, hashed it, and then generated spoofs.

No one has obtained the CSAM hashes, AFAIK. This repo is just a neural network that takes an input and produces a hash. It's as if someone released a new hash algorithm. (What you do with those hashes is a different story and the comparison of the hash with CSAM hashes is a whole other system.) This just shows that an image of a dog has the same hash value as a noisy gray image. That is, ask the model for the hash of the dog image, then ask the model how to change a gray image to make it output the same hash as the dog image. So it's a second-preimage, which we know has to be possible (by the pigeonhole principle); but it's really just a matter of feasibility. The interesting thing about neural network models (when compared to cryptographic hashes) is that they give you a gradient which tells you how you can change stuff to optimize some objective. This is the basis of deep dream, adversarial examples, and even just plain old training of neural network models.

nicolaskopp commented 2 years ago

@tmechen

If I'm not mistaken a preimage attack was done here

hmmm "This is so fake it's not even funny. These are just images generated by the model from https://thisartworkdoesnotexist.com . It's hilarious to see so many people falling for it here" (in the comments)

From what I understood, the story the HN comment is referring to is AI generated porn, which can match as "real" porn. Aka the "send dunes" story: https://petapixel.com/2017/12/20/uk-police-porn-spotting-ai-gets-confused-desert-photos/

This issue here, however, is a hash collission. This is huge, if confirmed. Really ugly stuff.

jakob11git commented 2 years ago

Does anyone know why they chose to compare it on the client instead of on the server, seeing as it is only ran on images uploaded to iCloud?

If they compare on the server then Apple is able to know the outcome of the comparison even if there is only a single match. The idea is that Apple can only know that a certain user has matching images in their iCloud account as soon as a certain threshold number of matches is reached. That's my understanding as a non-expert in this field.

Nicceboy commented 2 years ago

Does anyone know why they chose to compare it on the client instead of on the server, seeing as it is only ran on images uploaded to iCloud?

If they compare on the server then Apple is able to know the outcome of the comparison even if there is only a single match. The idea is that Apple can only know that a certain user has matching images in their iCloud account as soon as a certain threshold number of matches is reached. That's my understanding as a non-expert in this field.

I think they also inject false positives for results on device side, so they don't know if there is only a single match. They know only when threshold is reached. It was in the PSI paper.

Nicceboy commented 2 years ago

This issue here, however, is a hash collission. This is huge, if confirmed. Really ugly stuff.

Every practical hashing algorithm is one-to-many. It is only a matter of time when collisions come. This repo contains the model from older release and we don't really know how they have tweaked parameters on Apple side or even how the huge data set is improving the accuracy. NN has many layers so training is rather important. The goal of the algorithm is to seek matches from specific kind of material (CSAM), and that is used for training. Can we expect that it calculates hashes with similar accuracy for all kind of images? (e.g is it equally hard to make collision for CSAM material than the picture of the dog?)

nicolaskopp commented 2 years ago

This issue here, however, is a hash collission. This is huge, if confirmed. Really ugly stuff.

Every practical hashing algorithm is one-to-many. It is only a matter of time when collisions come. This repo contains the model from older release and we don't really know how they have tweaked parameters on Apple side or even how the huge data set is improving the accuracy. NN has many layers so training is rather important.

Let me quote from this article of someone who can explain this better than me:

https://www.hackerfactor.com/blog/index.php?/archives/929-One-Bad-Apple.html

In the six years that I've been using these hashes at FotoForensics, I've only matched 5 of these 3 million MD5 hashes. (They really are not that useful.) In addition, one of them was definitely a false-positive. (The false-positive was a fully clothed man holding a monkey -- I think it's a rhesus macaque. No children, no nudity.)

and:

According to NCMEC, I submitted 608 reports to NCMEC in 2019, and 523 reports in 2020. In those same years, Apple submitted 205 and 265 reports (respectively). It isn't that Apple doesn't receive more picture than my service, or that they don't have more CP than I receive. Rather, it's that they don't seem to notice and therefore, don't report.

Let that sink in.

Now back to our issue:

While hash detection is a really bad idea in general for such purposes, the specific implementation from apple does not matter, because it is closed source. You literally don't know what apple is doing, with what data, and what result comes out of it.

Even if apples specific algorithm is the best in the world and does not have these drawbacks: You would not know. You would have to live in constant fear of your phone "thinking" you may are doing something wrong. That's scary.

Nicceboy commented 2 years ago

Even if apples specific algorithm is the best in the world and does not have this drawbacks: You would not know. You would have to live in constant fear of your phone "thinking" you may are doing something wrong. That's scary.

That is very true. We can either give total trust or nothing at all on closed systems.

weskerfoot commented 2 years ago

@fuomag9 so the list of known CP hashes is shipped on every device? Isn't this a huge security issue?

It's not. It's encrypted using elliptic-curve cryptography first. They keep the actual CSAM neuralhash db on their server.

Why do they do that? A) To prevent reversing the hashes, as mentioned. B) So they can derive the encryption keys for the image locally, which is what the blinded hash table is also used for. I'm leaving out a lot of details that are mentioned in the whitepaper they released, so read that.

gillescoolen commented 2 years ago

@weskerfoot Yeah I edited my other comment with an explanation of the blinded hash.

LiEnby commented 2 years ago

all we need now is a hash of what is considered 'csam' to iphone, and start making images collide with it, if we could somehow make collisions with arbitrary content, (eg. like a funny meme that someone would be likely to save to there phone) that would be great-

I'm not sure how to get a hash without having such a file to hash it though ..

Oh just curious, since the hashing is done on the client side, would it be possible to tell iCloud that the hash matched every time? and if so, what's stopping you from just flooding it with random shit?

weskerfoot commented 2 years ago

Oh just curious, since the hashing is done on the client side, would it be possible to tell iCloud that the hash matched every time? and if so, what's stopping you from just flooding it with random shit?

In fact it already does that by design, in order to obscure how many matching images there are before the threshold is crossed where it can decrypt all matching images image

So if you can figure out a way to generate synthetic matches on demand, you could make it think there are lots of matches, but it would soon discover they're "fake" once the threshold is crossed, since it wouldn't be able to decrypt them all. Not sure what it would do if you did that repeatedly, maybe it would cause issues.

Edit: to be clear, the inner encryption key is associated with the NeuralHash. So if you have a false positive NeuralHash output, it would trigger manual review, but you would need to actually have that, which is why they keep them a secret.

indiv0 commented 2 years ago

56C19CA9-E56E-4742-A403-A8D478ECE688

WriteCodeEveryday commented 2 years ago

I wonder if this code can be ported so you can generate the collision an Android device...

LiEnby commented 2 years ago

can we take any given image and then make its hash totally different, despite the image looking basically identical? just curious if this even works for the one thing its suppost to do.. haha

fuomag9 commented 2 years ago

taking a given image and then make its hash totally different, despite the image looking basically identical. can we do that? just curious if this even works for the one thing its suppost to do..

cropping the image seems to work since the algorithm (at least what we have now) is vulnerable to that

LiEnby commented 2 years ago

cropping the image seems to work since the algorithm (at least what we have now) is vulnerable to that

ah yes, because pedos are being so sophisticated to use end to end encryption these days that we need to backdoor everyone's phones, because think of the children but there NOT sophisticated enough to ..checks notes.. crop images,

lericson commented 2 years ago

The cropping sensitivity is likely due to aliasing in convolutional neural networks. [1] is a starting point into that line of research. By starting point I really mean: read the related works and ignore the actual paper.

[1] https://openaccess.thecvf.com/content/CVPR2021/html/Chaman_Truly_Shift-Invariant_Convolutional_Neural_Networks_CVPR_2021_paper.html

LiEnby commented 2 years ago

The cropping sensitivity is likely due to aliasing in convolutional neural networks. [1] is a starting point into that line of research.

So, your saying if you just alias the fuck out of the image it wont be detected ? can this thing even handle like basic shit like JPEG-ing a png ?

feralmarmot commented 2 years ago

taking a given image and then make its hash totally different, despite the image looking basically identical. can we do that? just curious if this even works for the one thing its suppost to do..

cropping the image seems to work since the algorithm (at least what we have now) is vulnerable to that

Technically though, the cropped image could end up with a colliding hash with another CSAM image, couldn't it?

LiEnby commented 2 years ago

Technically though, the cropped image could end up with a colliding hash with another CSAM image, couldn't it?

i guess its possible, (though, so could cropping any image, right?) what are the chances of that happening? can someone whos smarter than me calculate the odds? :?

fuomag9 commented 2 years ago

Technically though, the cropped image could end up with a colliding hash with another CSAM image, couldn't it?

i guess its possible, (though, so could cropping any image, right?) what are the chances of that happening? can someone whos smarter than me calculate the odds? :?

An illegal image containing something like a vase that gets cropped only to the vase (which by itself it's not illegal) and then gets sent to you as an innocuous image could be considered illegal since it'd match part of a bigger picture?

LiEnby commented 2 years ago

An illegal image containing something like a vase that gets cropped only to the vase (which by itself it's not illegal) and then gets sent to you as an innocuous image could be considered illegal since it'd match part of a bigger picture?

now imagine another scenario, say you happen to have the same/simular looking vase, and happen to put it on a simular/same table with simular/same looking background and took a photo of it.\ would your completely unrelated but similar looking image now also be flagged?

deavmi commented 2 years ago

Apple had good intentions but Jesus this is not the way. I prefer how things currently are. Not this. Fals positives are a real thing and also I don't want a fucking bitcoin mining ai cringe daemon running amock chowint my battery.

gmaxwell commented 2 years ago

A goal for someone producing a second-preimage-image is to construct one that appears to be some arbitary innocuous image. An actual attacker wouldn't use an innocuous image, but would likely instead use nude photographs-- as they would be more likely to be confused for true positives for longer. A random innocuous image would be a reasonable POC that it could also be done with nudes.

Comments above that the target database would be needed by an attacker are mistaken in my view. The whole point of the database they're claiming to use is that it contains millions of highly circulated abuse images. An attacker could probably scan through darknet sites for a few hours and find a collection of sutiable images. They don't have to know for sure exactly what images in the database: Any widely circulated child abuse image is very likely to be in the database. If they obtain, say, 50 of them and use them to create 50 matching second-preimage-images then it should also be likely get the 30 or so hits required to trigger the apple infrastructure.

A real attacker looking to frame someone isn't going to worry too much that the possessing the images to create an attack is a crime-- so would the framing. That fact will just stand in the way of researchers producing POC to show that the system is broken and endangers the public.

I think it's important to understand that even though second-preimage-image attacks exist, concealing the database does not meaningfully protect the users-- as real attackers won't mind handling widely circulated child porn. The database encryption serves only to protect Apple and its (partially undisclosed!) list sources from accountability, and from criticism-via-POC by researchers who aren't out to break the law.

Nicceboy commented 2 years ago

now imagine another scenario, say you happen to have the same/simular looking vase, and happen to put it on a simular/same table with simular/same looking background and took a photo of it.\ would your completely unrelated but similar looking image now also be flagged?

I have lived in the illusion, that CSAM material has been used during development to make it more accurate for that kind of material, to outline this kind of abuse away. Is this just some false information, or misunderstanding by me? Otherwise, this is just general perceptual hashing function and purpose can be changed on any given point, and indeed will be less accurate.

lericson commented 2 years ago

@gmaxwell, you have misunderstood how the system works. If you generate 30 matching NeuralHashes, that just means that the 30 images are then matched on Apple's end with a different system. If that also says the images match, only then is it escalated to something actionable.

LiEnby commented 2 years ago

Hm, does anyone know if it counts anime loli/shota content as CSAM? because then there's a legal way to get hashes- just get someone from Japan to hash the files for you. and send the hashes back. easy-

its legal over there so no laws broken there, and assuming its not where you live.. your only getting hashes not the real files, so it would be totally legal.. right?

i mean its ethically questionable, maybe. (i guess it depends where you stand on that issue) but it should be legal right?? i donno, im not a lawyer >_<

fuomag9 commented 2 years ago

Hm, does anyone know if it counts anime loli/shota content as CSAM? because then there's a legal way to get hashes- just get someone from Japan to hash the files for you. and send the hashes back. easy-

its legal over there so no laws broken there, and assuming its not where you live.. your only getting hashes not the real files, so it would be totally legal.. right?

i mean its ethically questionable, maybe. (i guess it depends where you stand on that issue) but it should be legal right?? i donno, im not a lawyer >_<

That would surely be an interesting question if apple starts adding more databases or expands the system to other countries

cmsj commented 2 years ago

That would surely be an interesting question if apple starts adding more databases or expands the system to other countries

FWIW, Apple clarified recently that hashes in their CSAM database would need to be sourced from two separate countries.

judge2020 commented 2 years ago

@cmsj for reference:

https://www.theverge.com/2021/8/13/22623859/apple-icloud-photos-csam-scanning-security-multiple-jurisdictions-safeguard

image

And the hashes are indeed included in the iOS source image and can't be updated remotely without an iOS update.

Laim commented 2 years ago

Hm, does anyone know if it counts anime loli/shota content as CSAM? because then there's a legal way to get hashes- just get someone from Japan to hash the files for you. and send the hashes back. easy- its legal over there so no laws broken there, and assuming its not where you live.. your only getting hashes not the real files, so it would be totally legal.. right? i mean its ethically questionable, maybe. (i guess it depends where you stand on that issue) but it should be legal right?? i donno, im not a lawyer >_<

That would surely be an interesting question if apple starts adding more databases or expands the system to other countries

Would be interesting what they're considering as CP, is it all going under the US laws or will it be broken down to laws per a country? Since countries have different definitions to what is and what isn't CP, in some cases, such as Loli.

hackerfactor commented 2 years ago

taking a given image and then make its hash totally different, despite the image looking basically identical. can we do that? just curious if this even works for the one thing its suppost to do..

cropping the image seems to work since the algorithm (at least what we have now) is vulnerable to that

Has anyone worked out the percentage of cropping needed to avoid a match using NeuralHash? With PhotoDNA, it's about 2% off the width or height.

LiEnby commented 2 years ago

Has anyone worked out the percentage of cropping needed to avoid a match using NeuralHash? With PhotoDNA, it's about 2% off the width or height.

i tried cropping it seems to only change the hash by 1 or 2 bits. is that really enough ?

gmaxwell commented 2 years ago

that just means that the 30 images are then matched on Apple's end with a different system.

Apple has stated that they are reviewed by a human at that point.

From a legal perspective a human review prior to report is absolutely necessary to prevent the subsequent search by an agent by the government is absolutely required to prevent the search from being a fourth amendment violation. (see e.g. US v. Miller (6th Cir. 2020))

For PR reasons apple has claimed that the human review has protects people against governments secretly expanding the scope of the databases without apple's knowledge.

Because possession of child porn images is a strict liability crime, apple could not perform a second pass matching with a comparison with the actual image. They could use another, different, fingerprint as a prefilter before human review-- but if that fingerprint isn't similarly constructed the tolerance to resizing will be lost and they might have well just used sha256 over the decoded pixels in the first step and completely escaped the attack described in this thread.

hackerfactor commented 2 years ago

From a legal perspective a human review prior to report is absolutely necessary to prevent the subsequent search by an agent by the government is absolutely required to prevent the search from being a fourth amendment violation. (see e.g. US v. Miller (6th Cir. 2020))

When reporting to NCMEC's CyberTipline, they have a checkbox: have you reviewed it? If you say "no", then NCMEC's staff will review it. If you say "yes", then they may still review it, but it's not a guarantee.

Also, NCMEC forwards reports to the appropriate ICAC, LEO, or other enforcement organization. The report includes a copy of the picture and the recipient enforcement group reviews it.

yeldarby commented 2 years ago

It seems like it would be much harder to create a collision that also passes a sanity test like running it through OpenAI's CLIP model and verifying that the image is indeed plausibly CSAM.

I tested it out and CLIP identifies the generated image above as generated (in fact, of the top 10,000 words in the English language that has the closest match, followed by IR, computed, lcd, tile, and canvas): https://blog.roboflow.com/apples-csam-neuralhash-collision/