bbaaii / DreamDiffusion

Implementation of “DreamDiffusion: Generating High-Quality Images from Brain EEG Signals”
MIT License
429 stars 49 forks source link

About Pretraining EEG Dataset #9

Open leofan90 opened 9 months ago

leofan90 commented 9 months ago

Hello! I'm running EEG pretraining with code stageA1_eeg_pretrain.py. I'm wondering do you provide the Pretraining EEG Dataset? I saw the path in this code (../dreamdiffusion/datasets/mne_data/) doesn't include in the README.md File path Description. I'm looking forward to your reply. Thank you!

laddg95 commented 9 months ago

Had you resolve it?

Zoe-Wan commented 9 months ago

I have the same problem.

laddg95 commented 9 months ago

I know what's going on. The mne_data here actually comes from the ..\datasets\eeg_5_95_std.pth file, but you need to extract the EEG data in the file, convert each tensor to ndarray and save it as one document. In my case, I named the file i.npy, where i is the index of each tensor, and now the program runs successfully. In addition, the shape of each tensor is (128, L), and L is not equal. The __getitem__ method in the eeg_pretrain_dataset class will unify L.

Zoe-Wan commented 9 months ago

I extract the numpy arrays from the pth file successfully! Thanks to @laddg95. But I still think this amount of data ( 11952 EEG records, for each has size 128*500) is not enough for pretraining task ( the author said it should be around "120,000 EEG data samples from over 400 subjects with channel ranges from 30 to 128" )

laddg95 commented 9 months ago

Your suspicions should be correct, after all the pre-training data and test data should not be the same. Maybe the author wants us to use our own pre-training data. I noticed that in other issues, others provided links to other data sources, which may be used as pre-training data. I don't have time to verify it yet. If your experiment is successful, you can reply to me here, I will be very grateful! Thanks!

moyu2023101 commented 9 months ago

@laddg95 You can still download it now \Datasets eeg 5 95_ Is the std.pth file or other similar pre trained EEG checkpoint? Or in which directory it is located, I cannot find it. If you could provide me with a link, I would greatly appreciate it.

laddg95 commented 9 months ago

Please see the README section https://studentiunict-my.sharepoint.com/personal/concetto_spampinato_unict_it/_layouts/15/onedrive.aspx?ga=1&id=%2Fpersonal%2Fconcetto%5Fspampinato%5Funict%5Fit%2FDocuments%2Fsito%5FPeRCeiVe%2Fdatasets%2Feeg%5Fcvpr%5F2017%2Fdata

moyu2023101 commented 9 months ago

The problem has been resolved!!! Thanks to @laddg95 !

miao-zhuyue commented 8 months ago

I know what's going on. The mne_data here actually comes from the ..\datasets\eeg_5_95_std.pth file, but you need to extract the EEG data in the file, convert each tensor to ndarray and save it as one document. In my case, I named the file i.npy, where i is the index of each tensor, and now the program runs successfully. In addition, the shape of each tensor is (128, L), and L is not equal. The __getitem__ method in the eeg_pretrain_dataset class will unify L.

Thanks!!!!

Venchy-he commented 8 months ago

@Zoe-Wan Have you finished this experiment?

Tom345345 commented 8 months ago

@laddg95 .Hello, did you run successfully? I can not find these documents——" generation/checkpoint_best.ptn and eeg_pretain/checkpoint.pth ". Can you share them? Thank you very much。

Zoe-Wan commented 8 months ago

@Zoe-Wan Have you finished this experiment?

Yes, I have used the extracted data to pretrain, and I have tried some params which are mentioned in the paper ( such as mask ratio, patch size etc. ). I find as the mask ratio is higher, the reconstruction performance is better. I got 0.55 average correlation (using the eval method in the code) while using mask ratio 0.75 and patch size 4 . And the reconstruction result is like the pic below. reconst-08-10-2023-15-59-56

But unfortunately I do not have enough GPU memory ( just like @SunshineXiang ) to continue this work. And even if I have enough GPU memory, the pth dataset it gives is for finetuning, so I still can not continue. And apparently, 11952 EEG records is not enough for pretraining. Because it does not split the train dataset and the test dataset, overfitting in a little dataset does not make sense, so I still suggest that you find another dataset to do this pretraining task (do not forget preprocessing).

miao-zhuyue commented 8 months ago

你完成这个实验了吗?

是的,我已经使用提取的数据进行预训练,并且我尝试了一些论文中提到的参数(例如掩模比例,补丁大小等)。我发现随着掩模率的提高,重建性能越好。我得到了 0.55 的平均相关性(在代码中使用 eval 方法),同时使用掩码比 0.75 和补丁大小 4 .重建结果如下图所示。 Reconst-08-10-2023-15-59-56

但不幸的是,我没有足够的 GPU 内存(就像 )来继续这项工作。即使我有足够的 GPU 内存,它给出的 pth 数据集也是用于微调的,所以我仍然无法继续。显然,11952 个脑电图记录不足以进行预训练。因为它没有拆分训练数据集和测试数据集,所以在一个小数据集中过度拟合是没有意义的,所以我还是建议你找另一个数据集来做这个预训练任务(不要忘记预处理)。

Last month, I found that you have asked about pretrains/generation/checkpoint_bset.pth, this file, had you solved this problem?Does this file need to be trained by yourself? If so, may I ask in which file?

Zoe-Wan commented 8 months ago

你完成这个实验了吗?

是的,我已经使用提取的数据进行预训练,并且我尝试了一些论文中提到的参数(例如掩模比例,补丁大小等)。我发现随着掩模率的提高,重建性能越好。我得到了 0.55 的平均相关性(在代码中使用 eval 方法),同时使用掩码比 0.75 和补丁大小 4 .重建结果如下图所示。 Reconst-08-10-2023-15-59-56 但不幸的是,我没有足够的 GPU 内存(就像 )来继续这项工作。即使我有足够的 GPU 内存,它给出的 pth 数据集也是用于微调的,所以我仍然无法继续。显然,11952 个脑电图记录不足以进行预训练。因为它没有拆分训练数据集和测试数据集,所以在一个小数据集中过度拟合是没有意义的,所以我还是建议你找另一个数据集来做这个预训练任务(不要忘记预处理)。

Last month, I found that you have asked about pretrains/generation/checkpoint_bset.pth, this file, had you solved this problem?Does this file need to be trained by yourself? If so, may I ask in which file?

Yeah, you should train it by yourself, the author has not shared the file yet. But just like I said, I can not finetune the model, so I do not have the ckpt file neither. If you want to try it, you can organize the file just like what readme file says. ( This comment may be replied to #5 )

Venchy-he commented 8 months ago

@Zoe-Wan Thank you very much!

Tom345345 commented 8 months ago

@Zoe-Wan Hello, I see in the above information that you have successfully run the code. I am a novice. Do these two files named (pretrains/generation/checkpoint_bset.pth and pretrains/eeg_pretain/checkpoint_bset.pth) really affect the code operation? Could you share these two files? I would appreciate it. What I said is not correct, please give me your advice.

Tom345345 commented 8 months ago

@Zoe-Wan Hello, I see in the above information that you have successfully run the code. I am a novice. Do these two files named (pretrains/generation/checkpoint_bset.pth and pretrains/eeg_pretain/checkpoint_bset.pth) really affect the code operation? Could you share these two files? I would appreciate it. What I said is not correct, please give me your advice.

bbaaii commented 8 months ago

You can directly download this data from the MOABB project https://github.com/NeuroTechX/moabb and use their respective processing code for operations like filtering. Finally, save them as npy files.

caltexs commented 7 months ago

你完成这个实验了吗?

是的,我已经使用提取的数据进行预训练,并且我尝试了一些论文中提到的参数(例如掩模比例,补丁大小等)。我发现随着掩模率的提高,重建性能越好。我得到了 0.55 的平均相关性(在代码中使用 eval 方法),同时使用掩码比 0.75 和补丁大小 4 .重建结果如下图所示。 Reconst-08-10-2023-15-59-56 但不幸的是,我没有足够的 GPU 内存(就像 )来继续这项工作。即使我有足够的 GPU 内存,它给出的 pth 数据集也是用于微调的,所以我仍然无法继续。显然,11952 个脑电图记录不足以进行预训练。因为它没有拆分训练数据集和测试数据集,所以在一个小数据集中过度拟合是没有意义的,所以我还是建议你找另一个数据集来做这个预训练任务(不要忘记预处理)。

Last month, I found that you have asked about pretrains/generation/checkpoint_bset.pth, this file, had you solved this problem?Does this file need to be trained by yourself? If so, may I ask in which file?

Yeah, you should train it by yourself, the author has not shared the file yet. But just like I said, I can not finetune the model, so I do not have the ckpt file neither. If you want to try it, you can organize the file just like what readme file says. ( This comment may be replied to #5 )

excuse my stupidness, but couldn't someone do all the training and then upload the ckpt file for us to use? Or why doesnt the author do that? Not a programmer, so sorry if this is a stupid question.

xiaozhongyaoyongan commented 6 months ago

我成功地从第 pth 文件中提取了 numpy 数组!感谢 .但我仍然认为这个数据量(11952 条脑电图记录,每条记录的大小为 128*500)不足以完成预训练任务(作者说它应该是“来自 120 多名受试者的 000,400 个脑电图数据样本,通道范围从 30 到 128”) Hello, I am researching this project now, but I do not know how to extract the pth file into numpy array, can you give me a little help? I can also give the algorithm myself to study, if you can not thank!

NZW666666 commented 6 months ago

这段代码中的路径(../dreamdiffusion/datasets/mne_data/) 不包含在 README.md 文件路径说明中,有没有大佬帮帮忙,上面大佬说的看不太懂,谢谢

KrishKrosh commented 5 months ago

You can directly download this data from the MOABB project https://github.com/NeuroTechX/moabb and use their respective processing code for operations like filtering. Finally, save them as npy files.

From my understanding, the MOABB dataset has multiple different datasets and processing steps for different paradigms. Can you let us know what specific datasets, sampling rate, processing steps, and params you used?

chenjunxia06 commented 4 months ago

@Zoe-Wan Hello, how can I get the numpy from ".. datasetseeg_5_95_std.pth" file? I'm a newbie. Please give me your advice. I would be grateful.

qiaoyub commented 3 months ago

我成功地从 pth 文件中提取了 numpy 数组!感谢 .但我仍然认为这个数据量(11952 个脑电图记录,每个都有 128*500 的大小)不足以完成预训练任务(作者说它应该在“来自 400 多个受试者的 120,000 个脑电图数据样本,通道范围从 30 到 128 不等”)

Hello,about the pth extraction data, and then EEG pre-training operation of the specific process of how to work, I need your help very much, thank you very much!!

qiaoyub commented 3 months ago

这段代码中的路径(../dreamdiffusion/datasets/mne_data/) 不包含在 README.md 文件路径说明中,有没有大佬帮帮忙,上面大佬说的看不太懂,谢谢

请问你解决这个问题了吗