Task for prompt-guided generative art

dmarx commented 3 years ago

🚀 Feature

Implement VQGAN+CLIP system using flash instrumentation. Example baseline colab: https://github.com/justinjohn0306/VQGAN-CLIP

Motivation

OpenAI's release of CLIP sparked a surge of interest in AI art generation. The AI art community has not yet embraced version-controlled tooling and the space is flooded with variations on the same google colab notebook. The AI art community would benefit from improved, opinionated tooling, and I believe lightning flash could be a good fit.

Pitch

Add a demo to the docs demonstrating how to implement VQGAN-CLIP image generation from a text prompt using lighting/flash tooling.

utilize v2 pipeline to serialize processing
leverage existing tasks as much as possible (ImageClassifier?, TextClassifier?, ImageEmbedder?, StyleTransfer? ... yeah, maybe this is a new task?)

Alternatives

https://github.com/justinjohn0306/VQGAN-CLIP

Additional context

I want to get involved in pytorch-lightning development. This system has several moving parts most of which need to be modular: I believe implementing this demo will be a good way for me to take a tour of flash's functionality and test the bounds of what the current set of implemented tasks can achieve. I may ultimately implement this as a new task, but I'm starting from the assumption that I can achieve this using existing tooling. At the very least, I think the WIP data pipeline API will be useful for orchestrating the various components of this system.

dmarx commented 3 years ago

Please assign to me and remove the help wanted tag :)

dmarx commented 3 years ago

Yeah... just barely getting started and I'm pretty sure this is going to be a new task. Thinking it would go under a new "multimodal" task datatype subfolder? A "multimodal" subfolder could be home to tasks like:

Image QA
Image Captioning
(Zero-shot) Image Classification
Image-text similarity (i.e. CLIP)

ethanwharris commented 3 years ago

Definitely room here for a new group of tasks. Perhaps the existing audio tasks would fall into this category as well since they deal with text / image data too. @dmarx Assigning you to the issue :smiley: We should start by figuring out how we want the data loading to work and also looking for a good PyTorch framework we could integrate to provide the architectures, loss functions, etc.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Lightning-Universe / lightning-flash