garethjns / Kaggle-EEG

Seizure prediction from EEG data using machine learning. 3rd place solution for Kaggle/Uni Melbourne seizure prediction competition.
101 stars 29 forks source link

Running the code #13

Closed YasminAMassoud closed 4 years ago

YasminAMassoud commented 4 years ago

Hello

I am trying to run the code but i currently have 2 issues :

1)I have only original Kaggla data, so not sure how to modify the code as i dont have the new test set 2) trainedModelsCompactTest.mat i dont have this file so it give me error while running the code

Thanks

garethjns commented 4 years ago

Hi Yasmin,

I'm afraid my memory of the Kaggle data is a bit hazy. I think a second test set was provided along with a list of leaks in the old test set that could be moved into the training set. This is what the copyTestLeakToTrain.m script does. I think you might be able to skip this step, and manually create the directory structure (as in the readme) but just using the original data.

The trainedModelsCompactTest.mat should be produced by train.m if you can get it to run with the data you have.

YasminAMassoud commented 4 years ago

@garethjns Thanks for your input in this issue ,I have solved the trainedmodelscompacttest.mat now. *For issue 2 ) I tried this is then have and error Unable to read file singles1.mat , which is i believe is output from copytestleaktotrain.mat , It will be helpful if you give me advice on how to obtain this file without using copytestleaktrain.mat

*Also the New folder now will have files named pat1test_1_0.mat for example they wont be _new will this create a problem ?

garethjns commented 4 years ago

If I remember correctly, the singles_n.mat files have a list of files for each subject where the file is an independent 10 minutes of recording, rather than a 10 min segment of a consecutive 60 min recording (spread over 6 files). I think all of these will have originally been in the test set (which were all supposed to be independent 10 minute segments), before the leak was discovered. This would make sense if the singles.mat files are created by copyTestLeakToTrain.m.

I think if you just have the original set it might be worth trying to modify the method that loads the training files. This has logic to load the group 10 min segments of the 60 min groups, and logic to load the single files, and mark them as single so they aren't concatenated together.

This method is here: https://github.com/garethjns/Kaggle-EEG/blob/c8883b1b1371b89781b2f82f412559ddbca5f362/%40featuresObject/featuresObject.m#L441 It looks like it loads all the '*.mat' files from the directories it finds in paths.dataDir, so the _new in the files names shouldn't matter as long as paths.DataDir is pointing to the correct place.

I've found copies of the singles.mat files, but I don't have access to MATLAB so can't check their contents at the moment. I've pushed them to the root of the reop on this branch https://github.com/garethjns/Kaggle-EEG/tree/singles_files, but you may not need them. Actually, another be worth trying is to use them but leave them empty (as there are no single files in your training set). That might mean you wouldn't need to modify the method that loads the files.

YasminAMassoud commented 4 years ago

Thanks to @garethjns , i managed to do it by editing copytestleaktotrain to have singles.mat file according to new data description on ecosystem website .