Closed nihaoxiaoli closed 4 months ago
Have you seen our latest work MindEye2? We share the dataset creation script for it here: https://github.com/MedARC-AI/MindEyeV2/blob/main/src/dataset_creation.ipynb
Generally speaking though, the test set are all the shared1000 samples and the train/val are the non-shared1000 samples. The train/val distinction was random (something like 10% of the train become the val set). All samples were shuffled after being allocated to train/val/test, so the sample numbering system was arbitrary.
Hope that helps!
Have you seen our latest work MindEye2? We share the dataset creation script for it here: https://github.com/MedARC-AI/MindEyeV2/blob/main/src/dataset_creation.ipynb
Generally speaking though, the test set are all the shared1000 samples and the train/val are the non-shared1000 samples. The train/val distinction was random (something like 10% of the train become the val set). All samples were shuffled after being allocated to train/val/test, so the sample numbering system was arbitrary.
Hope that helps!
Thank you for your assistance in answering my question !
Could you share the preprocessing script for mindeye 1? I cannot reproduce the results by modifying the one for mindeye 2.
That code is not in a state that would be usefully sharable, but all in theory you need is to download the NSD dataset and use the functions in this repo (https://github.com/tknapen/nsd_access) to extract the betas from nsdgeneral ROI. The webdataset format used for MindEye1 is just shuffling those betas + images into tar files split by train/val/test, and doing voxelwise z-scoring based on the train split
I see. In the script https://github.com/MedARC-AI/MindEyeV2/blob/main/src/dataset_creation.ipynb, you have behav
, past_behav
, old_behav
, future_behav
. The voxel inputs for MindEye 1 is of size (batch_size, 3, num_voxels), may I know which one I should choose for MindEye 1's inputs and what does the "3" correspond to?
the 3 corresponds to the 3 image repeats -- subjects saw each image 3 times across the scans
you'd want to use behav to get the intended voxel indices if you were adopting the MindEye2 dataloading approach; note that MindEye1 and MindEye2 use different webdatasets and different data loading approaches
Thanks so much for the reply!! I am using mindeye1's way to load the dataset. But mindeye1 training pipeline accepts voxels (batch_size, 3, num_voxels)
but behav
generated by the mindeye2's script is of size (batch_size, num_voxels)
, so I am very confused.
yeah we grouped the repeats together for ME1 but not for ME2
Why not just download the ME1 webdataset we provided if you are running ME1? The ME1 and ME2 datasets were not meant to be used interchangeably across the two papers; you'd need to do some coding changes that I can't commit time to help with if you want to modify ME1 code to work with ME2 data
It's just because I need to use all voxels of the fMRI data so I need to rerun the preprocessing script. So only thing I need to do now is to group behav
by image_idx
?
if you need to use all the voxels then you can do either of these:
for ME1 if you dont mind the fact that we dont include the last 3 sessions from Algonauts, we already provide the whole brain data in the tar files in the webdataset_avg_split folder (that's why these tar files are a larger file size than webdataset_avg_new): https://huggingface.co/datasets/pscotti/naturalscenesdataset/tree/main/webdataset_avg_split/train
for ME2 create a new betas_all_subj01_fp32_renorm.hdf5 file by concatenating across voxels pulled from nsda_access (https://github.com/tknapen/nsd_access)
or yes you could manually implement code to group the same repeats from behav, but note youd first need to preload all the samples in order to find the repeats whereas behav defaults to just the batch size not the full dataset
Oh, I have been using webdataset_avg_split
but why do different subjects have different number of voxels? In my impression the number of voxels should be the same if it is the whole brain data?
Oh, I have been using
webdataset_avg_split
but why do different subjects have different voxels? In my impression the number of voxels should be the same if it is the whole brain data?
Hello, I am the person who raised this issue. You are probably wondering how the images are matched with the fMRI signals, right? It's in the original data nsddata/experiments/nsd/nsd_expdesign.mat.
You should first check the NSD official data introduction at: https://cvnlab.slite.page/p/CT9Fwl4_hc/NSD-Data-Manual
I hope this can answer your question. If you have any more questions, please create a new issue for inquiry.
Hello, thank you for your excellent work. May I ask if it is possible to provide the processing code for raw NSD data to 'webdataset_avg_split'? I know how to convert from volume to voxel, but I want to know how to divide training, validation, testing, and how each sample is named, such as 'sample00000300'. Looking forward to your reply.