CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.36k stars 10.07k forks source link

Training AI models using images of artists without consent to mimic them. #500

Open gaming-se opened 1 year ago

gaming-se commented 1 year ago

Hello Stable Diffusion Community. I would like to point out to a problem that happened recently. Specifically, someone used the Stable Diffusion tools to train a model using only images of one Artist to mimic the artist's style and being able to create the images that are similar to the art of the Artist, but the Artist never agreed to it.

I like Stable Diffusion and I see a big use for the tools that are based on it, but there is a difference in responsible use of the tools and having ethical standards, of not trying not to harm ppl and there are others ... they try to active goals and the goals or results of them are more important to them than the ppl that they might hurt. This special case, of training a model to recreate the artist's style is in my opinion the extreme version of anti-ethical use of the tools. some of the users of this model argument that it's an AI, it just learned how to create images and learning on itself should be allowed because learning has no copyright. But does the concept of learning apply on an Algorithm that uses only one source of input without any other influence? AI having a high precision in applying the creation process? Memorizing things, in a way ... that humans are usually not able? I do not think so.

From my current knowledge (simplified), this AI uses the Pixel space as input and the model that been trained, gain decision weights based on pixel use positions and the pixel value, that way it is capable of mimicking directly/indirectly

To be able to mimicking other Artist as human, you would need to have all the Artist's images in your memory and to be able to do that perfectly you would need to have a high class visual memory without flaws, and a huge amount of time. AI don't need that it's a computer programm, it has its "Visual memory" saved in the weights of the model. Comparing it to human standards of learning would be wrong, because even if you ask an artist to recreate an image that he draw, the artist would have some differences in results (based on complexity of image) while the AI would be able to recreate the image it has been able to create before (if the randomization factors are known / seeds) Even thought it sounds so Powerful. In my Opinion, the Artists going to remain in advantage in the Art system. Mainly because the most deterministic language to describe how the picture should be is the drawing process itself. But it takes time and effort to do so. The AI on the other hand can mass produce art, it seems to be similar to the images its model been trained on. it's not a full replacement of the artist. But, by some luck, it can produce some acceptable pieces for a low amount of effort. And that kinda sounds not right. While the artist took a huge amount of time effort and skill to build his internal model (that he uses for his decision-making process on what color brush etc. he uses, his proportions, etc.) the artist images are used without consent to create a replica of the artist own internal decision model, to be used to create art that is going to compete against the Artist. This kind of problem need more awareness and some kind of handling.

(btw. the problem persist even if the inputs are multiple artists, but there is a switch that decides that specific weights of one artist should be used)

In my opinion, only Images from Artists that want to participate in AI should be included (Option-in)

I like AI, but misuse of it, can cause damage that makes life for artists hard, and some of them struggle already.

I took the approach to reach out here because in my opinion you have the most influence to propagate how the tools should be used. It's advanced technology, but with great power comes great responsibility.

*Somting that I also want to mention is that if you know who is the artist or who is the model creator, please don't mentioned them because it leads to harassment on both sides. And please don't try to shift the view doing something like whataboutism. Im shure this community is better than that and educated enough to see straight through the whataboutism argumentation.

Kind regards

IceMetalPunk commented 1 year ago

Two things.

  1. Humans are capable of copying each others' style very well, and as such, this issue was settled (at least from a legal standpoint) long before generative AIs existed. It was brought before various courts, and the rulings were explicitly defined: you can copyright specific works of art, but you cannot copyright styles of art. Anyone is free to study a particular artist's repertoire in order to mimic their style, and it is NOT a violation of copyright, because it is not considered stealing or unethical, since the actual piece of work you end up creating is novel. This applies to humans, and there's no good reason it shouldn't also apply to AIs.

  2. While the input space of a diffusion model does include the pixel values of the training images, that is not the same as saying it learns to "copy" those pixels. It learns aspects of the training images at an abstract level, the same way an art student might study an artist's work to learn new painting techniques or, as previously mentioned, to learn how to paint like that artist. The fact that the AIs can do this "better than humans" is irrelevant; otherwise the argument reduces to "all of this is fine to do, as long as the entity doing it isn't too smart". If a human prodigy were born who could match the artistic learning and generation ability of the best AI, saying they would be an exception to the existing precedent decisions because they're "too good at it" would be ridiculous. Therefore, it's equally ridiculous to use that logic to say AIs should be an exception to the precedents because "they're too good at it".

TL;DR: There is no copying of individual works here (and, in the exceptional times when there is copying due to overfitting, THAT is certainly copyright infringement and it's up to the user to reject those generations). You cannot legally steal a style, and these models are analogous to very smart art students learning techniques and taking inspiration; therefore, calling it stealing or unethical is inaccurate and disingenuous.

gaming-se commented 1 year ago

"you can copyright specific works of art, but you cannot copyright styles of art. Anyone is free to study a particular artist's repertoire in order to mimic their style, and it is NOT a violation of copyright" quote from IceMetalPunk Dec 29, 2022

It definitely applies to humans but the denoising process is something different (the denoising process is a software process, not a humanistic artistic process) its a logarythmic repeatable process and there is even more. for example you train your model being able to generate Spider-Man, one thing that is special about Spider-Man is that he has a logo on his suit. This logo can or may be copyrighted (i tried to look it up there is a trademark entry but could not find the image itself to compare) and now the fun part if someone extracts the vectors that lead to denoising of the logo how high is the chance that the court going say that your model has copyrighted material stored in it? Just because the model store the data in an unorganized way, it does not mean that the model don't have information stored in it, specifically if the vector size is a few bytes and the generation is multiple bytes larger.

"While the input space of a diffusion model does include the pixel values of the training images, that is not the same as saying it learns to "copy" those pixels. " quote from IceMetalPunk Dec 29, 2022

That is an argumentation that I saw view times but i have seen it only from ppl that did not look at the whole latent diffusion chart. it's like you are describing the start of something and then you do an abrupt ending at it. There is a whole process after the latent space part, the denoising, it's a part that runs again through pixel space out where it literary copies the pixels that been stored in model the pixel it has to chose is decided trough vectors.

"It learns aspects of the training images at an abstract level, the same way an art student might study an artist's work to learn new painting techniques [...] The fact that the AIs can do this "better than humans" is irrelevant; [...] [AI shuld not] be an exception to the existing precedent decisions [...] [Regardless of differences to human artists] " quote from IceMetalPunk Dec 29, 2022

i would kinda agree if there would not be a pixel space that has its data stored in an unorganized way. Fractions of images that are stored in it but scattered so hard that a human cant read it without tools and even tools not able to rebuild the information fully without external information about the vectors

As a programmer i would give all Artist an advice to put a logo into images in specific objects and doing it in a way that this logo can be scaled down a view times if someone tries to prompt an image having this objects they would end up having a copyrighted material in it. But that's for future … There is a whole past of images that don't have that in them.

IceMetalPunk commented 1 year ago

"While the input space of a diffusion model does include the pixel values of the training images, that is not the same as saying it learns to "copy" those pixels. " quote from IceMetalPunk Dec 29, 2022

That is an argumentation that I saw view times but i have seen it only from ppl that did not look at the whole latent diffusion chart. it's like you are describing the start of something and then you do an abrupt ending at it. There is a whole process after the latent space part, the denoising, it's a part that runs again through pixel space out where it literary copies the pixels that been stored in model the pixel it has to chose is decided trough vectors.

"It learns aspects of the training images at an abstract level, the same way an art student might study an artist's work to learn new painting techniques [...] The fact that the AIs can do this "better than humans" is irrelevant; [...] [AI shuld not] be an exception to the existing precedent decisions [...] [Regardless of differences to human artists] " quote from IceMetalPunk Dec 29, 2022

i would kinda agree if there would not be a pixel space that has its data stored in an unorganized way. Fractions of images that are stored in it but scattered so hard that a human cant read it without tools and even tools not able to rebuild the information fully without external information about the vectors

...no. Again, that is NOT how these systems work. The diffusion process is performed by a neural network as well. It does NOT store pixel information, it does NOT store specific images. It's trained on pairs of images and slightly noisier versions of the same image, and it learns HOW noise affects an image. The output of the diffuser is literally NOT the image at all, it's the predicted noise pattern, which then gets subtracted from the image for the next step. No part of these models, neither the CLIP embeddings nor the diffusion networks, record or copy pixel data. That's just not how they work at a fundamental level.

"you can copyright specific works of art, but you cannot copyright styles of art. Anyone is free to study a particular artist's repertoire in order to mimic their style, and it is NOT a violation of copyright" quote from IceMetalPunk Dec 29, 2022

It definitely applies to humans but the denoising process is something different (the denoising process is a software process, not a humanistic artistic process) its a logarythmic repeatable process and there is even more. for example you train your model being able to generate Spider-Man, one thing that is special about Spider-Man is that he has a logo on his suit. This logo can or may be copyrighted (i tried to look it up there is a trademark entry but could not find the image itself to compare) and now the fun part if someone extracts the vectors that lead to denoising of the logo how high is the chance that the court going say that your model has copyrighted material stored in it? Just because the model store the data in an unorganized way, it does not mean that the model don't have information stored in it, specifically if the vector size is a few bytes and the generation is multiple bytes larger.

There is no "vector that leads to the denoising of the logo". There are vectors which, when used to process specific starting noise patterns, may result in the logo. Those are two very different concepts. No pixel data about the logo are stored in the vectors. All those vectors represent, in an abstract way, would be "if the starting noise looks this way, then the noise that was added probably looks like this." To say this is the same as being a "vector that leads to the denoising of the Spider-Man logo" is like saying a chisel is a "tool that leads to carving the David statue" and therefore chisels are to blame for someone copying David.

As a programmer i would give all Artist an advice to put a logo into images in specific objects and doing it in a way that this logo can be scaled down a view times if someone tries to prompt an image having this objects they would end up having a copyrighted material in it. But that's for future … There is a whole past of images that don't have that in them.

Or, just copyright the image they made. As I said, if the AI actually does generate a copy of the work, it is copyright infringement already. If you need a special watermark to be forced into the generation to recognize it as yours, then the generation without that watermark wasn't yours to begin with.

gaming-se commented 1 year ago

"The diffusion process is performed by a neural network as well." quote from IceMetalPunk Jan 2, 2023

Nobody told that it's not the case. Nobody doubts that.

"It does NOT store pixel information, it does NOT store specific images." quote from IceMetalPunk Jan 2, 2023

I disagree on that one, just because the information is not stored in an ordered way where you can list trough elements does not mean that it's not stored in best example models that never had a logo in its training data are not going to spontaneously generate a logo on objects it's been placed on. On the other hand, create objects having logo infused into them and then train the model and try to generate objects without surprising you get the logos back.

"No pixel data about the logo are stored in the vectors. All those vectors represent, in an abstract way, would be "if the starting noise looks this way, then the noise that was added probably looks like this." To say this is the same as being a "vector that leads to the denoising of the Spider-Man logo" is like saying a chisel is a "tool that leads to carving the David statue" and therefore chisels are to blame for someone copying David." quote from IceMetalPunk Jan 2, 2023

hmm, I would more compare it to a rubber stamp (denoising network) and ink pillow (noise) depending where the ink is on noise the rubber stamp gets inked and generates a image corresponding to the ink input of the pillow. from time to time there are people that copying logos in to rubber stamps they use and if the ink touches that part it generates the image. Nobody blames the rubber stamps for its functionality .. Nobody blames the ink pillow for its existence but the ones that engraved the logo into the stamp have to take a responsibility for what they're doing, if they mass produce the logo in the images they generate using the stamp.

@IceMetalPunk may I ask you to prove that you know what you talk about and the best way I might think of is if you can say to me where the call procedure of pre and after denoising happens for text to image generation. Basically the file name and the line number in this master branch where the processing leafs the latent diffusion space and enters the pixel space

" If you need a special watermark to be forced into the generation to recognize it as yours, then the generation without that watermark wasn't yours to begin with." quote from IceMetalPunk Jan 2, 2023

i miss the evidence, resoning in your argumentation you provided a claim nothing more.