BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction [Link to paper]

Framework

Abstract: Analyzing and reconstructing visual stimuli from brain signals effectively advances our understanding of the human visual system. However, EEG signals are complex and contain significant noise, leading to substantial limitations in existing approaches of visual stimuli reconstruction from EEG. These limitations include difficulties in aligning EEG embeddings with fine-grained semantic information and a heavy reliance on additional large-scale datasets for training. To address these challenges, we propose a novel approach called BrainVis. This approach introduces a self-supervised paradigm to learn EEG time-domain features and incorporates frequency-domain features to enhance EEG representations. We also propose a multi-modal alignment method called semantic interpolation to achieve fine-grained semantic reconstruction. Additionally, we employ cascaded diffusion models to reconstruct images. Using only 9.1\% of the training data required by previous mask modeling works, our proposed BrainVis outperforms state-of-the-art methods in both semantic fidelity reconstruction and generation quality.

Results

We provide more results here.

Preparation

Environment

We recommend installing 64-bit Python 3.8 and PyTorch 1.12.0. On a CUDA GPU machine, the following will do the trick:

pip install numpy==1.26.0
pip install ftfy==6.2.0
pip install omegaconf==2.3.0
pip install einops==0.8.0
pip install torchmetrics==1.4.0.post0
pip install pytorch-lightning==2.3.3
pip install transformers==4.42.4
pip install kornia==0.7.3
pip install diffusers==0.29.2

We have done all testing and development using A100 GPU.

Create paths

python create_path.py

Download required files

CLIP. Place the "clip" folder in this project.
Pre-trained stable diffusion model v1-5-pruned-emaonly. Place the "v1-5-pruned-emaonly.ckpt" to path "/pretrained_model".
EEG-Image pairs dataset. Place "block_splits_by_image_all.pth", "block_splits_by_image_single.pth" and "eeg_5_95_std.pth" to path "/data/EEG".
A copy of required ImageNet subset. Unzip it to path "/data/image".

Obtain the training data required for the alignment process

python imageBLIPtoCLIP.py
python imageLabeltoCLIP.py

Train the model

Run train_freqencoder.py to train the frequency encoder.
Run main.py to pre-train the time encoder.
Comment out "trainer.pretrain()" on line 59 of main.py, and uncomment "trainer.finetune()" on line 61. Run main.py to fine-tune the time encoder.
Modify "_all" to "_single" in line 14 of datautils.py, and change "default=0" to any number from 1 to 6 in line 19 to use a different single subject. Comment out line 61 in main.py and uncomment "trainer.finetune_timefreq()" on line 64. Run main.py to integrate the time and frequency models.
Comment out line 64 of main.py, and uncomment "trainer.finetune_CLIP()" on line 65. Run main.py to conduct cross-modal EEG alignment.
Modify the "train_mode=" to "False" on line 56 of main.py and run it to save the alignment results for reconstruction.

Image Reconstruction

python cascade_diffusion.py

Results will be saved in the path "/picture-gene".

Broader Information

BrainVis builds upon several previous works:

Citation

@article{fu2023brainvis,
    title={BrianVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction},
    author={Honghao Fu and Zhiqi Shen and Jing Jih Chin and Hao Wang},
    journal={arXiv preprint arXiv:2312.14871},
    year={2023}
}

RomGai / BrainVis

readme