Extend MindEye to fMRI-to-text COCO caption reconstruction

Feed MindEye CLIP ViT embeddings to pretrained blip2 and you should be able to get fMRI-to-text. Would require retraining MindEye to 257x1024 rather than 257x768 (see Atom_101's explanation here: https://discord.com/channels/1025299671226265621/1072330931546882128/1112967458643521597)

(It takes about 8 hours to retrain MindEye on a single subject using a single A100 40GB gpu)

MedARC-AI / fMRI-reconstruction-NSD