Abstract: Analyzing and reconstructing visual stimuli from brain signals effectively advances our understanding of the human visual system. However, EEG signals are complex and contain significant noise, leading to substantial limitations in existing approaches of visual stimuli reconstruction from EEG. These limitations include difficulties in aligning EEG embeddings with fine-grained semantic information and a heavy reliance on additional large-scale datasets for training. To address these challenges, we propose a novel approach called BrainVis. This approach introduces a self-supervised paradigm to learn EEG time-domain features and incorporates frequency-domain features to enhance EEG representations. We also propose a multi-modal alignment method called semantic interpolation to achieve fine-grained semantic reconstruction. Additionally, we employ cascaded diffusion models to reconstruct images. Using only 9.1\% of the training data required by previous mask modeling works, our proposed BrainVis outperforms state-of-the-art methods in both semantic fidelity reconstruction and generation quality.
We provide more results here.
Environment
We recommend installing 64-bit Python 3.8 and PyTorch 1.12.0. On a CUDA GPU machine, the following will do the trick:
pip install numpy==1.26.0
pip install ftfy==6.2.0
pip install omegaconf==2.3.0
pip install einops==0.8.0
pip install torchmetrics==1.4.0.post0
pip install pytorch-lightning==2.3.3
pip install transformers==4.42.4
pip install kornia==0.7.3
pip install diffusers==0.29.2
We have done all testing and development using A100 GPU.
Create paths
python create_path.py
Download required files
Obtain the training data required for the alignment process
python imageBLIPtoCLIP.py
python imageLabeltoCLIP.py
train_freqencoder.py
to train the frequency encoder.main.py
to pre-train the time encoder.main.py
, and uncomment "trainer.finetune()" on line 61. Run main.py
to fine-tune the time encoder.datautils.py
, and change "default=0" to any number from 1 to 6 in line 19 to use a different single subject. Comment out line 61 in main.py
and uncomment "trainer.finetune_timefreq()" on line 64. Run main.py
to integrate the time and frequency models.main.py
, and uncomment "trainer.finetune_CLIP()" on line 65. Run main.py
to conduct cross-modal EEG alignment.main.py
and run it to save the alignment results for reconstruction.python cascade_diffusion.py
Results will be saved in the path "/picture-gene".
BrainVis builds upon several previous works:
@article{fu2023brainvis,
title={BrianVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction},
author={Honghao Fu and Zhiqi Shen and Jing Jih Chin and Hao Wang},
journal={arXiv preprint arXiv:2312.14871},
year={2023}
}