Open pietrobolcato opened 3 years ago
Hi there, thanks!
Yes, realtime is definitely on my todo list. I think it is feasible, although you'll need a big fat GPU.
Currently, 1920x1080 rendering for my 1024px networks runs at ~15 fps on my 2x1080Ti setup. Including calculating all the latents, noise, and modulations at the start, it takes ~2x the time as what's rendered in total.
However, there are quite a couple speedups that should be fairly easy to pick up + a couple further on the horizon that should help a lot as well:
(more involved)
In any case, in terms of StyleGAN inference speed, I think there's more than enough headroom to get realtime framerates even when calculating latents and noise on the fly.
What I'm a little less certain of are the options in terms of realtime onsets, chroma, and RMS. These are such standard use cases I'd assume that there are nice algorithms for them, although the question remains as to how good they look.
I also know that spleeter is more than fast enough to split in realtime, so I think that would be a great option to alleviate any quality issues of calculating musical features in realtime (it's much easier to get nice drum onsets if the only thing in the audio is the drums!)
I'm not sure when I'll have time to work on things (as I'm doing this in my free time next to a full-time masters + a part-time job), but hopefully I'll slowly knock things off the list over the course of the next months/year.
I've been looking everywhere for this exact concept and this is the only mention of it that I've found. Stoked that it's actually possible. Any chance it'll become compatible with StyleGan3?
My new PC with a 3060 shows up next week, so I have yet to play around with it, but stoked to hear that this is on the way.
Is there a way to map a midi controller to parameters within software, or run software within Ableton as a M4L device? If so, I feel like this is a true hidden gem and could open the door to the next era of A/V sets.
Interested too, will keep an eye on this thread!
I've developed a real-time version using Apache TVM for a client's project.
It's possible to run 3 screens of 1920x1080 interactively on a single RTX 3090. So running a single screen should be possible on a 3060.
I'll be integrating similar capabilities into Maua soon. Barebones real-time support (for StyleGAN1, 2, and 3) should be available within a few months, although I'm not sure how to go about TVM support.
@lowlypalace @notlittlebrain @seicaratteri
Hi @JCBrouwer, was wondering if there is any update on this? The project looks amazing, by the way! Thanks!
Hey, thanks for your interest haha. It's been harder for me to work on this lately as I no longer have local access to my GPU machine, doing real-time stuff over a webconnection isn't the greatest.
I do have the main script for rendering straight to the screen available here. It's just doing a random interpolation, but the idea is to add live audio analysis into the forward function.
I'm nearing the end of a big overhaul of the audio-reactive framework based on my masters thesis. This will include the real-time audio analysis part as well. If you're feeling adventurous though, you can try and add it into the script above yourself.
With a 30 series GPU, TVM shouldn't be needed to get decent framerates (maybe at slightly smaller resolutions).
Hey, @JCBrouwer I wanted to follow up to see if you've had a chance to implement any of these. Thanks!
Have you given the gpu2gl
script a try? What kind of frame-rates were you getting for the random interpolation?
A lot of the changes I've made are tied up in a private branch for a client project I'm working on at the moment. These are mainly just API things though, all the underlying functionality is already available.
For example, GPU-accelerated audio feature calculation can be found here and audio-reactive latent sequence functions can be found here. These functions can be called directly in the forward function of the real-time module in gpu2gl
.
This only leaves loading an audio stream, which is really dependent on your specific audio setup.
Hello! :) first of all i wanted to congratulate your work, it's incredible! 🥇
I was wondering if, in your opinion, it would be possible to extend your work to generate the visuals in real-time. This would mean to use streaming of audio data rather than pre-rendered files. Do you think it would be anyhow realistic?
Thank you in advance! keep up the great work! :)