Lightning-Universe / lightning-flash

Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains
https://lightning-flash.readthedocs.io
Apache License 2.0
1.74k stars 212 forks source link

Task for prompt-guided generative art #903

Closed dmarx closed 2 years ago

dmarx commented 3 years ago

🚀 Feature

Implement VQGAN+CLIP system using flash instrumentation. Example baseline colab: https://github.com/justinjohn0306/VQGAN-CLIP

Motivation

OpenAI's release of CLIP sparked a surge of interest in AI art generation. The AI art community has not yet embraced version-controlled tooling and the space is flooded with variations on the same google colab notebook. The AI art community would benefit from improved, opinionated tooling, and I believe lightning flash could be a good fit.

Pitch

Add a demo to the docs demonstrating how to implement VQGAN-CLIP image generation from a text prompt using lighting/flash tooling.

Alternatives

https://github.com/justinjohn0306/VQGAN-CLIP

Additional context

I want to get involved in pytorch-lightning development. This system has several moving parts most of which need to be modular: I believe implementing this demo will be a good way for me to take a tour of flash's functionality and test the bounds of what the current set of implemented tasks can achieve. I may ultimately implement this as a new task, but I'm starting from the assumption that I can achieve this using existing tooling. At the very least, I think the WIP data pipeline API will be useful for orchestrating the various components of this system.

dmarx commented 3 years ago

Please assign to me and remove the help wanted tag :)

dmarx commented 3 years ago

Yeah... just barely getting started and I'm pretty sure this is going to be a new task. Thinking it would go under a new "multimodal" task datatype subfolder? A "multimodal" subfolder could be home to tasks like:

ethanwharris commented 3 years ago

Definitely room here for a new group of tasks. Perhaps the existing audio tasks would fall into this category as well since they deal with text / image data too. @dmarx Assigning you to the issue :smiley: We should start by figuring out how we want the data loading to work and also looking for a good PyTorch framework we could integrate to provide the architectures, loss functions, etc.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.