CINPLA / exdir

Directory structure standard for experimental pipelines.
http://exdir.rtfd.io
MIT License
72 stars 13 forks source link

Deeply nested paths #50

Closed halvarsu closed 5 years ago

halvarsu commented 6 years ago

Accessing data in deeply nested file structures takes a long time, especially on slow servers. Creating the group can in some cases take more time than loading the data itself afterwards, as the following example from the (outdated) exdirio for Neo shows.

Timer unit: 1e-06 s

Total time: 3.84142 s
File: [...]/exdirio.py
Function: read_analogsignal at line 438

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   438                                               def read_analogsignal(self, path, cascade=True, lazy=False):
   439         4    1752194.0 438048.5     45.6          group = self._exdir_directory[path]
   440         4     531153.0 132788.2     13.8          signal = group["data"]
   441         4         15.0      3.8      0.0          attrs = {'exdir_path': path}
   442         4     256822.0  64205.5      6.7          attrs.update(group.attrs.to_dict())
   443         4          7.0      1.8      0.0          if lazy:
   444                                                       ana = AnalogSignal([],
   445                                                                          lazy_shape=(signal.attrs["num_samples"],),
   446                                                                          units=signal.attrs["unit"],
   447                                                                          sampling_rate=group.attrs['sample_rate'],
   448                                                                          **attrs)
   449                                                   else:
   450         4     984231.0 246057.8     25.6              ana = AnalogSignal(signal.data,
   451         4     115694.0  28923.5      3.0                                 units=signal.attrs["unit"],
   452         4     150113.0  37528.2      3.9                                 sampling_rate=group.attrs['sample_rate'],
   453         4      51179.0  12794.8      1.3                                 **attrs)
   454         4          8.0      2.0      0.0          return ana

In this case the group structure is main.exdir/processing/electrophysiology/channel_group_2/LFP/LFP_timeseries_1/data, where the variable path contains all but the last of those terms.

lepmik commented 5 years ago

Is this still relevant?

halvarsu commented 5 years ago

Unsure, was mostly a problem for the slow nird-servers, where lookup to get to the next level of nesting took a long time

lepmik commented 5 years ago

Ok, then I close this. In general we recommend using GIT LFS, then you may download only the metadata to your local computer and thus avoid slow reading speed.