Open Yosshi999 opened 2 years ago
I think the effort to watermark the image, while noble, won't be too successful just because of all the screenshotting & transcoding and so forth. Dunno an effective way to avoid stable-diffusion cannibalizing its own output except through top-notch sanitizing and filtering of datasets.
This will be a problem with GPT as well..
Eventually I can imagine the opposite of your concern-- untoward characters wanting to sneak their logos, colors, etc. into the NN training data-- perhaps for embedding ads into generated output or for poisoning/diluting like-images to water them down so they don't render accurately.
Does this foreshadow a SEO-for-images arms race?
untoward characters wanting to sneak their logos, colors, etc. into the NN training data
Interesting. So in the near future there will be a huge amount of meaningless drawings with ads or logos (in Pinterest or something?) to inject themselves into training datasets, and those will annoy us like SEO-ed websites?
I don't know if it will happen, but if spammers start dumping images of their corporate logo everywhere and label them "balloon" then this might confuse a network trained on that data enough that eventually images with balloons generated might start looking like the logo. Obviously this would be a drop in the ocean in the data-- but that is the current aim of SEO, and it works to some success. Might be a lot of arms races going on as far as keeping datasets from being poisoned... (this is all apart from the dog-eating-its-own-tail issue that Karpathy mentioned.)
I hear some artists hate having their drawings used for ML datasets (especially training generative model like this!). One of the options they have is to mimic a machine-generated image by putting the same watermark as the outputs of stable-diffusion. What do you think about this strategy? When many artists do that, we can no longer crawl the outputs of stable-diffusion (but I think it is a trivial matter) I want to see everyone's opinion.