NeuroJSON / easyh5

EasyH5 Toolbox - An easy-to-use HDF5 data interface (loadh5 and saveh5) for MATLAB
BSD 3-Clause "New" or "Revised" License
12 stars 10 forks source link

Faster Saving #9

Closed umair-hassan closed 4 years ago

umair-hassan commented 4 years ago

Hi Fang,

I was saving a compound structure and found that yours save function is slower than theirs; https://github.com/VisLab/EEG-HDF5-Tools/tree/master/matlab

whereas theirs loading function is slower than yours.

Could you look at their struct2h5 for MATLAB and try to incorporate it in yours load class? The thing I love in your class is that it doesnt ask for a new name everytime where as theirs will ask for a new name of file on every iteration.

Let me know if I could help you

Best regards, Umair

fangq commented 4 years ago

@umair-hassan, happy to look at that.

Could you look at their struct2h5 for MATLAB and try to incorporate it in yours load class?

I assume you meant the saving function?

with a quick glimpse of the code, I can see that struct2h5 does not handle all matlab data types, but I am very curious how it performs for typical arrays. Can you provide a benchmark dataset that I can profile both codes against?

umair-hassan commented 4 years ago

Hi Fang,

Yes its true that they are not handling arrays. I have this struct that I want to save as h5 or any other format but I need the timing to be as minimum as possible lets say in milliseconds.

The attached dataset is here: https://drive.google.com/file/d/1FDHguxVmMwRfMB1YXqKSTz31240wVJ29/view?usp=sharing

Best, Umair

fangq commented 4 years ago

@umair-hassan, can you let me know what command did you call to write your data using struct2h5?

I called struct2h5('testspeed2.h5',b) and it only produced a file of 2.2MB in size; while saving such data using easyh5 created a file of 100MB in size.

umair-hassan commented 4 years ago

@fangq , I also used this struct2h5('test.h5', b); and yes its not saving the cell arrays concatenated inside the struct (which it should in principle).

With easyh5, when i used a normal matrix its faster for the number of samples, however when they are divided into trials using cell arrays, its consuming more time. What could be the best solution for me to use here?

Best regards

fangq commented 4 years ago

then the timing you compared will be entirely different if one saves significantly different amount of data (50x) between the two codes. I just inserted

    oid=0;
    return;

before this line to bypass saving cell data, then, saveh5 produces a 2.1MB file, similar to struct2h5's output, a tic-toc shows that saveh5 consumes 0.07 s while struct2h5 costs about twice as long - 0.15-0.16s.

feel free to give a try.

umair-hassan commented 4 years ago

Yes that is true, I tried it now and the results looks similar.

If I save the same amount of data as in this given structure I provided but as structure with doubles (but not as cell arrays -which is the case of the data I provided), the time it takes is significantly less. Hows that? Is there any other method that can give me no difference in timing while saving the structures with doubles inside it or with cell arrays in side it?

Thanks!

fangq commented 4 years ago

if you can provide me another test data, I will be happy to take a look again.

also, you can run

profile on
% command to save your data;
profile viewer;

to show you the difference in timing and pinpoint where in the code is responsible for the difference.

fangq commented 4 years ago

hi @umair-hassan, if you sees any additional cases where easyh5 is slower than the reference tool you use, please feel free to send me the test data. otherwise, feel free to close this ticket.

umair-hassan commented 4 years ago

Hi @fangq Thanks for your help, sorry I was away and couldnt respond properly. I think I have found out a solution!

fangq commented 4 years ago

glad you found a solution - if you notice any inefficiency in easyh5, please feel free to reopen this ticket and send me your test code.