Closed dmarx closed 2 years ago
Please assign to me and remove the help wanted tag :)
Yeah... just barely getting started and I'm pretty sure this is going to be a new task. Thinking it would go under a new "multimodal" task datatype subfolder? A "multimodal" subfolder could be home to tasks like:
Definitely room here for a new group of tasks. Perhaps the existing audio tasks would fall into this category as well since they deal with text / image data too. @dmarx Assigning you to the issue :smiley: We should start by figuring out how we want the data loading to work and also looking for a good PyTorch framework we could integrate to provide the architectures, loss functions, etc.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🚀 Feature
Implement VQGAN+CLIP system using flash instrumentation. Example baseline colab: https://github.com/justinjohn0306/VQGAN-CLIP
Motivation
OpenAI's release of CLIP sparked a surge of interest in AI art generation. The AI art community has not yet embraced version-controlled tooling and the space is flooded with variations on the same google colab notebook. The AI art community would benefit from improved, opinionated tooling, and I believe lightning flash could be a good fit.
Pitch
Add a demo to the docs demonstrating how to implement VQGAN-CLIP image generation from a text prompt using lighting/flash tooling.
Alternatives
https://github.com/justinjohn0306/VQGAN-CLIP
Additional context
I want to get involved in pytorch-lightning development. This system has several moving parts most of which need to be modular: I believe implementing this demo will be a good way for me to take a tour of flash's functionality and test the bounds of what the current set of implemented tasks can achieve. I may ultimately implement this as a new task, but I'm starting from the assumption that I can achieve this using existing tooling. At the very least, I think the WIP data pipeline API will be useful for orchestrating the various components of this system.