KullmannLab / pyecog2

PyEcog2 is a python software package aimed at exploring, visualizing and analyzing (video) EEG telemetry data
GNU General Public License v3.0
4 stars 0 forks source link

NDF converter issues #18

Closed mikailweston closed 3 years ago

mikailweston commented 3 years ago

leaving fs at 'auto' does not work well with many bad messages or files with missing data:

Progress: |--------------------------------------------------| 0.0% CompleteERROR:root: >half messages detected as bad messages. Probably change fs from auto to the correct frequency

class DataHandler:

    def convert_ndf_directory_to_h5(self, ndf_dir,
                                    tids='all',
                                    save_dir='same_level',
                                    n_cores=9,
                                    fs=1024,
                                    glitch_detection=True,
                                    high_pass_filter=False,
                                    gui_object=False):

results in this error:


Progress: |*-------------------------------------------------| 1.0% Complete
Something unexpected went wrong loading [97, 98] from F:/epilepsy files/Rat telemetry/Rig4/2021/ndf\M1611780237.ndf :
Traceback (most recent call last):
  File "C:\Users\mweston\Documents\GitHub\pyecog2\pyecog2\ndf_converter.py", line 647, in convert_ndf
    ndf.load(tids,
  File "C:\Users\mweston\Documents\GitHub\pyecog2\pyecog2\ndf_converter.py", line 469, in load
    self.correct_sampling_frequency()
  File "C:\Users\mweston\Documents\GitHub\pyecog2\pyecog2\ndf_converter.py", line 320, in correct_sampling_frequency
    regularised_time = np.linspace(0, self.file_length, num= self.file_length * self.tid_to_fs_dict[tid])
  File "<__array_function__ internals>", line 5, in linspace
  File "C:\Users\mweston\Anaconda3\envs\pyecog2\lib\site-packages\numpy-1.20.0rc1-py3.8-win-amd64.egg\numpy\core\function_base.py", line 120, in linspace
    num = operator.index(num)
TypeError: 'float' object cannot be interpreted as an integer
None
Looking for files: F:\epilepsy files\Rat telemetry\Rig4\2021\H5_Py2\210.97_98 \ *.h5
Traceback (most recent call last):
  File "C:\Users\mweston\Documents\GitHub\pyecog2\pyecog2\coding_tests\ProjectGUI.py", line 208, in update_project_settings
    self.project.add_animal(Animal(id=id,eeg_folder=eeg_dir,video_folder=video_dir))
  File "C:\Users\mweston\Documents\GitHub\pyecog2\pyecog2\ProjectClass.py", line 75, in __init__
    self.update_eeg_folder(eeg_folder)
  File "C:\Users\mweston\Documents\GitHub\pyecog2\pyecog2\ProjectClass.py", line 113, in update_eeg_folder
    create_metafile_from_h5(file,duration)
  File "C:\Users\mweston\Documents\GitHub\pyecog2\pyecog2\ProjectClass.py", line 18, in create_metafile_from_h5
    fs = fs_dict[int(h5_file.attributes['t_ids'][0])]
IndexError: index 0 is out of bounds for axis 0 with size 0
mfpleite commented 3 years ago

I couldn't replicate this in my computer: M1611780237.ndf seems to convert just fine, with the correct 1024 sampling frequency... Could you create a minimal example so that I can replicate myself?

mikailweston commented 3 years ago

Shared a set of example files with you via OneDrive, check your email.

mfpleite commented 3 years ago

So, I got different error messages from yours... For me I just had the ">half messages detected as bad messages. Probably change fs from auto to the correct frequency", which is more of a warning than an error - I changed the code to reflect that. I think for you it was just because your data is just concentrated in a few minutes across the ndf file, instead of the whole hour. On top of it there was indeed some messing up with trying to convert ndfs without the requested transmitters. It was basically grabbing all the tids from that file. I corrected this and now it just jumps that file conversion. There are still some not so great things which come to the fact that we have hard coded a duration of 1 hr for the file. Some of your files were not starting one hour appart from each other and only had a few minutes, so when plotting it plots a flat line on top of the next file... This is not ideal for the other tools (e.g. it messes up a bit the wavelets), but it should not happen systematically I suppose. The ndf 1 hr thing is hard coded in deep pyecog history, so it will be a bit too much work to solve properly at this precise moment. The binary files, don't have this issue, because the correct duration is stored in the meta. What I can try to do is to correct this when I infer the metadata of the h5 file, so that if another file exists before the end of the hour, I reduce the duration in the metadata. Let's see if this solves most of the issues.

mfpleite commented 3 years ago

Ok actually my past self had already implemented this XD, it was just the FileBuffer that wasn't using this information. Just so it stays documented, the problem is that we cannot assume the clock of the file naming is perfectly in synch with the clocks of the different transmitters, so the file duration*sampling freq can be different from the number of datapoints in the file. The file buffer however concatenates all the data together and only flags this in the time arrays accompanying the data, which are mostly ignored by everything else (wavelets, feature extractor, etc.) apart from the plots. This makes sure that there are no weird discontinuities in the data, and all imprecisions are just on the scale of the mismatch between the clocks (very small). The H5 files converted from NDF files screw this up because they generate interpolated data to fill the hour (and actually there is no elegant way around this at the level of the single file, given the possible missing data etc...). The solution will always have to come at the project level, when looking at inconsistent times of the H5files and their durations. Anyway I think this seems to be satisfactorily closed to me. Let me know if you still have issues otherwise. Cheers!

mikailweston commented 3 years ago

OK, I suppose one thing to try would be to have multiple different sample rates in the same NDF file. I don't currently have any of those files to test thought

mfpleite commented 3 years ago

If they are for different animals there is no problem. For the same animal, it becomes quite an exotic case... In the future I might implement something to plot different data types all at the same time (so that we could for example plot the accelarometer data, eeg fetures etc all along each other), but this quite a bit of work, so I want to have a minimal working product with the classifier first.

mfpleite commented 3 years ago

@mikailweston can we close this issue?

mikailweston commented 3 years ago

Yes, saving settings in project file works well. I just need to remember to save the project file after converting.