CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.59k stars 10.09k forks source link

What will happen when artists apply the same invisible watermark to their works? #127

Open Yosshi999 opened 2 years ago

Yosshi999 commented 2 years ago

I hear some artists hate having their drawings used for ML datasets (especially training generative model like this!). One of the options they have is to mimic a machine-generated image by putting the same watermark as the outputs of stable-diffusion. What do you think about this strategy? When many artists do that, we can no longer crawl the outputs of stable-diffusion (but I think it is a trivial matter) I want to see everyone's opinion.

fat-tire commented 2 years ago

I think the effort to watermark the image, while noble, won't be too successful just because of all the screenshotting & transcoding and so forth. Dunno an effective way to avoid stable-diffusion cannibalizing its own output except through top-notch sanitizing and filtering of datasets.

This will be a problem with GPT as well..

Eventually I can imagine the opposite of your concern-- untoward characters wanting to sneak their logos, colors, etc. into the NN training data-- perhaps for embedding ads into generated output or for poisoning/diluting like-images to water them down so they don't render accurately.

Does this foreshadow a SEO-for-images arms race?

Yosshi999 commented 2 years ago

untoward characters wanting to sneak their logos, colors, etc. into the NN training data

Interesting. So in the near future there will be a huge amount of meaningless drawings with ads or logos (in Pinterest or something?) to inject themselves into training datasets, and those will annoy us like SEO-ed websites?

fat-tire commented 2 years ago

I don't know if it will happen, but if spammers start dumping images of their corporate logo everywhere and label them "balloon" then this might confuse a network trained on that data enough that eventually images with balloons generated might start looking like the logo. Obviously this would be a drop in the ocean in the data-- but that is the current aim of SEO, and it works to some success. Might be a lot of arms races going on as far as keeping datasets from being poisoned... (this is all apart from the dog-eating-its-own-tail issue that Karpathy mentioned.)