cctbx / dxtbx

Diffraction Experiment Toolbox
BSD 3-Clause "New" or "Revised" License
2 stars 18 forks source link

Overhead constructing a NeXus format class #276

Open rjgildea opened 3 years ago

rjgildea commented 3 years ago
$ cat time_format_class.py 
import time
import sys

import dxtbx.format

def run(filename):
    format_class = dxtbx.format.Registry.get_format_class_for_file(filename)
    format_instance = format_class(filename)
    N = 100
    t0 = time.perf_counter()
    for i in range(100):
        format_instance = format_class(filename)
    t1 = time.perf_counter()
    print(f"{t1 - t0:0.4f} seconds ({(t1 - t0)/N:0.4f} seconds per image)")

if __name__ == "__main__":
    run(sys.argv[1])
$ dials.python time_format_class.py $(dials.data get -q vmxi_thaumatin)/image_15799_master.h5
10.2865 seconds (0.1029 seconds per image)

A cbf format class for comparison:

$ dials.python time_format_class.py $(dials.data get -q x4wide)/X4_wide_M1S4_2_0001.cbf      
0.0410 seconds (0.0004 seconds per image)

This adds an overhead comparable in timescale to that required for the thresholding algorithm in spotfinding, and adds a not insignificant overhead to the per-image analysis run at DLS.

Note that in the context of per-image analysis the format class is currently constructed twice, which doubles the resulting overhead (see https://github.com/dials/dials/pull/1539).

rjgildea commented 3 years ago

The majority of the time appears to be in the disappear in the NXmxReader and ultimately NXmxEntry classes, e.g.

find_entries(): 0.0131
NXmxEntry(): 0.0648
NXmxReader(): 0.0782
FormatHDF5.__init__(): 0.0943

Profiling with pprofile seems to suggest a lot of time ultimately going in the h5py get() and __getitem__() methods.

~70% of the runtime is in local_visit in nexus.py. Called from line 96:

if isinstance(nx_file.get(key, getlink=True), h5py.ExternalLink):
Line #|      Hits|         Time| Time per hit|      %|Source code
(call)|     35900|      5.33487|  0.000148604| 14.68%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/group.py:300 get

Called from line 106:

k = nx_file[key]
Line #|      Hits|         Time| Time per hit|      %|Source code
(call)|     34700|      12.8531|  0.000370407| 35.38%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/group.py:279 __getitem__

hits is per 100 instantiations of the format class

h5py Group.__getitem__() itself shows:

   279|     40400|     0.081593|  2.01963e-06|  0.22%|    @with_phil
   280|         0|            0|            0|  0.00%|    def __getitem__(self, name):
   281|         0|            0|            0|  0.00%|        """ Open an object in the file """
   282|         0|            0|            0|  0.00%|
   283|     40400|     0.101013|  2.50032e-06|  0.28%|        if isinstance(name, h5r.Reference):
   284|         0|            0|            0|  0.00%|            oid = h5r.dereference(name, self.id)
   285|         0|            0|            0|  0.00%|            if oid is None:
   286|         0|            0|            0|  0.00%|                raise ValueError("Invalid HDF5 object reference")
   287|         0|            0|            0|  0.00%|        else:
   288|     40400|       2.5647|  6.34826e-05|  7.06%|            oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
(call)|     40400|     0.119282|  2.95254e-06|  0.33%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:305 id
(call)|     40400|     0.662101|  1.63886e-05|  1.82%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:177 _e
(call)|     40400|     0.116085|  2.87339e-06|  0.32%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:165 _lapl
(call)|     51100|     0.319665|  6.25567e-06|  0.88%|# <frozen importlib._bootstrap>:389 parent
(call)|     51100|     0.756264|  1.47997e-05|  2.08%|# <frozen importlib._bootstrap>:1017 _handle_fromlist
   289|         0|            0|            0|  0.00%|
   290|     40000|     0.225483|  5.63708e-06|  0.62%|        otype = h5i.get_type(oid)
   291|     40000|    0.0871985|  2.17996e-06|  0.24%|        if otype == h5i.GROUP:
   292|     11100|    0.0713341|   6.4265e-06|  0.20%|            return Group(oid)
(call)|     11100|     0.177862|  1.60236e-05|  0.49%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/group.py:33 __init__
   293|     28900|    0.0539398|  1.86643e-06|  0.15%|        elif otype == h5i.DATASET:
   294|     28900|     0.701477|  2.42725e-05|  1.93%|            return dataset.Dataset(oid, readonly=(self.file.mode == 'r'))
(call)|     28900|      4.56094|  0.000157818| 12.55%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:280 file
(call)|     28900|     0.563441|  1.94962e-05|  1.55%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/files.py:255 mode
(call)|     28900|      3.98227|  0.000137795| 10.96%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/dataset.py:580 __init__
   295|         0|            0|            0|  0.00%|        elif otype == h5i.DATATYPE:
   296|         0|            0|            0|  0.00%|            return datatype.Datatype(oid)
   297|         0|            0|            0|  0.00%|        else:
   298|         0|            0|            0|  0.00%|            raise TypeError("Unknown object type")
phyy-nx commented 3 years ago

I've profiled nexus.py before too and had similar conclusions. Most of the time is spent reading all the metadata from the h5 file. There are so many leaf nodes to examine to read flags, strings, and constants. I've wondered if re-writing nexus.py in c++ would be faster, but it would be a big undertaking.

dagewa commented 1 year ago

After those two merges, I see this now:

$ dials.python time_format_class.py $(dials.data get -q vmxi_thaumatin)/image_15799_master.h5
1.1481 seconds (0.0115 seconds per image)

vs

$ dials.python time_format_class.py $(dials.data get -q x4wide)/X4_wide_M1S4_2_0001.cbf
0.0257 seconds (0.0003 seconds per image)