ME2 Synthetic Retrieval

Brain retrieval requires using the original training samples as a retrieval library. This has two limitations:

We are limited to data that we have matched fMRI pairs for.
We are limited to data that the model has explicitly seen during training, which may be inflating results.

To address this, we want to run all of the images in subject 1's training set through GNet to make synthetic data, predicted patterns of "fake" brain activity for these images. We can then pass these synthetic samples through ME2 inference to make a brain retrieval library that solves the 1st issue.

Part 2 of this task is to do the same process but with a library that is independent of the subject-specific training data, and are images not seen during training. For this we have a few options, using a different section of COCO that's not part of NSD, using NSD training images that were only seen by a different subject (be careful if using a multi-subject model, we don't want training data overlap so we can address problem #2 above), or synthetic data that's completely out of distribution, like LAION 5-B.

MedARC-AI / MindEye_Imagery

ME2 Synthetic Retrieval #12