Keytoyze / Mug-Diffusion

High-quality and controllable charting AI for rhythm games, modifed from stable diffusion
MIT License
186 stars 14 forks source link

Question about the training data #9

Closed Trumpet63 closed 1 year ago

Trumpet63 commented 1 year ago

I'm going to try really hard not to be the fun-police.

When you say "Thank all the Charters / Mappers in the community. It's you who endowed MuG Diffusion with intelligence.", the worst case I can imagine is that you just downloaded several thousand Stepmania sm's and mp3's and used that to train the model.

Could I trouble you to explain more specifically what data you used, where you got it from, and under what permissions you used them?

Keytoyze commented 1 year ago

I downloaded around 30k charts from osu and malody (I would publish the list this week). As for the permission, it's impossible to request so many mappers. To be honest, this problem is very complicated because there is no explicit regulation about AI training dataset currently. I think the trained model weights and AI-created charts are in the public domain and not owned by myself.

Trumpet63 commented 1 year ago

You're correct. The legal system (regardless of which coutry) is still trying to figure out the implications of modern AI. I think it's great that you're trying to do a good thing for the community by releasing the weights in the public domain. I have a lot of respect for the work you've put in.

That being said, my personal feeling is this project is... questionable. I suppose it's up to the community at large to decide how they feel about it, and I'm not a stepartist, but here's my two cents:

PC rhythm games are already notorious for using pirated music. Then, in games like Osu, permission is given by music artists exclusively to Osu and only under the condition that Osu doesn't charge money for their game. Presumably stepartists are contributing under those same assumptions.

Given that the weights are in the public domain, it's entirely possible that tomorrow someone will release a game, charge money for it, and use this ML model on the backend - which in a way is circumventing the wishes of the authors of the training data.

Maybe if you wanted to prevent that specific possibility you could modify the license to be non-commercial? Even so I would still say the whole project is iffy. Totally up to you.

Keytoyze commented 1 year ago

Thank you very much for your understanding. I agree with your opinion about the license problem and I have modified README.MD to declare that model weights are non-commercial (3612b74).

Additionally, my motivation for this project is to explore whether the machine can understand music and meet my curiosity, rather than earning something or violating charters' rights. I am glad to reduce the negative effect brought by this project, but I think even if I didn't create this project, there must be someone in the future to train a similar AI since AIGC is the future trend (To the best of my knowledge, at lease five persons trying to create charting AI recently). Feel free to give me more advice to make things better.

Trumpet63 commented 1 year ago

I don't have any more comments for now. This was a productive conversation, thank you :)

Keytoyze commented 1 year ago

I downloaded around 30k charts from osu and malody (I would publish the list this week). As for the permission, it's impossible to request so many mappers. To be honest, this problem is very complicated because there is no explicit regulation about AI training dataset currently. I think the trained model weights and AI-created charts are in the public domain and not owned by myself.

I published the dataset here, also in the commit c2903d4.