McStasMcXtrace / iFit

a simple library to analyze data (with McCode and Phonons/DFT hooks). :warning: this project has been moved to https://gitlab.com/soleil-data-treatment/soleil-software-projects/remote-desktop
http://ifit.mccode.org
Other
5 stars 5 forks source link

Loading: Lazy or compressed storage #201

Open farhi opened 4 years ago

farhi commented 4 years ago

It could be efficient to use either lazy loading (see #193), or in memory compression with a fast compressor, such as:

This latter works with Matlab 2017. An adaption to old MeX functions may be needed for old Matlab versions (e.g. 2010a).

A quick test:

e=eye(1000); we = whos('e');
methods = {'zlib','gzip','lzip','lzma','lz4','lz4hc'};
for m=methods;
  t0=clock;
  [ss, info]=zmat(e, 1, m{1});
  dt = etime(clock, t0);
  ws = whos('ss');
  fprintf(1, '%10s %10.3f %10.3f\n', m{1}, dt, we.bytes/ws.bytes);
end

Results are highly dependent on the initial data. Here we use a matrix with mostly zeros. Sparse storage would be a good solution as well.

method          time  comp_ratio
      zlib      0.048    838.574
      gzip      0.052    839.102
      lzip      0.274   6488.240
      lzma      0.256   6514.658
       lz4      0.001    254.818
     lz4hc      0.002    254.834

With random data, compression ration is very bad (around 1). With organised data (for instance magic), it is pretty good. In all cases, using lz4 compressor is the fastest, by far.

This could be embedded into estruct/findfield. Its cached data can be used to identify large blocks, and then compress them dynamically, as an alias, or a new compressed object, that should implement basic methods (subsref, subsasgn, ...).

farhi commented 4 years ago

Combining with MappedTensor:

can be great :+1:

What could be done: