Improve low-level reconstruction pipeline

Can you figure out a way to beat our current metrics (for low-level pipeline) of .456 (PixCorr) and .493 (SSIM) for subject 1? Use any method you can think of to try to improve upon the current approach.

Maybe mapping to a different embedding space than Stable Diffusion's variational autoencoder? Or adopting a novel training strategy? Could even consider a ControlNet approach with multi-token textual inversion (let me know in advance if you go down that path)

One possibility: Brain-Diffuser has a low-level pipeline that maps to vdvae pretrained on imagenet-64. There is a new vdvae that came out that maps to imagenet-256. Might work better? https://github.com/ericl122333/latent-vae

MedARC-AI / fMRI-reconstruction-NSD

Improve low-level reconstruction pipeline #17