Open rjgildea opened 3 years ago
The majority of the time appears to be in the disappear in the NXmxReader
and ultimately NXmxEntry
classes, e.g.
find_entries(): 0.0131
NXmxEntry(): 0.0648
NXmxReader(): 0.0782
FormatHDF5.__init__(): 0.0943
Profiling with pprofile seems to suggest a lot of time ultimately going in the h5py get()
and __getitem__()
methods.
~70% of the runtime is in local_visit in nexus.py. Called from line 96:
if isinstance(nx_file.get(key, getlink=True), h5py.ExternalLink):
Line #| Hits| Time| Time per hit| %|Source code
(call)| 35900| 5.33487| 0.000148604| 14.68%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/group.py:300 get
Called from line 106:
k = nx_file[key]
Line #| Hits| Time| Time per hit| %|Source code
(call)| 34700| 12.8531| 0.000370407| 35.38%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/group.py:279 __getitem__
h5py Group.__getitem__()
itself shows:
279| 40400| 0.081593| 2.01963e-06| 0.22%| @with_phil
280| 0| 0| 0| 0.00%| def __getitem__(self, name):
281| 0| 0| 0| 0.00%| """ Open an object in the file """
282| 0| 0| 0| 0.00%|
283| 40400| 0.101013| 2.50032e-06| 0.28%| if isinstance(name, h5r.Reference):
284| 0| 0| 0| 0.00%| oid = h5r.dereference(name, self.id)
285| 0| 0| 0| 0.00%| if oid is None:
286| 0| 0| 0| 0.00%| raise ValueError("Invalid HDF5 object reference")
287| 0| 0| 0| 0.00%| else:
288| 40400| 2.5647| 6.34826e-05| 7.06%| oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
(call)| 40400| 0.119282| 2.95254e-06| 0.33%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:305 id
(call)| 40400| 0.662101| 1.63886e-05| 1.82%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:177 _e
(call)| 40400| 0.116085| 2.87339e-06| 0.32%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:165 _lapl
(call)| 51100| 0.319665| 6.25567e-06| 0.88%|# <frozen importlib._bootstrap>:389 parent
(call)| 51100| 0.756264| 1.47997e-05| 2.08%|# <frozen importlib._bootstrap>:1017 _handle_fromlist
289| 0| 0| 0| 0.00%|
290| 40000| 0.225483| 5.63708e-06| 0.62%| otype = h5i.get_type(oid)
291| 40000| 0.0871985| 2.17996e-06| 0.24%| if otype == h5i.GROUP:
292| 11100| 0.0713341| 6.4265e-06| 0.20%| return Group(oid)
(call)| 11100| 0.177862| 1.60236e-05| 0.49%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/group.py:33 __init__
293| 28900| 0.0539398| 1.86643e-06| 0.15%| elif otype == h5i.DATASET:
294| 28900| 0.701477| 2.42725e-05| 1.93%| return dataset.Dataset(oid, readonly=(self.file.mode == 'r'))
(call)| 28900| 4.56094| 0.000157818| 12.55%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/base.py:280 file
(call)| 28900| 0.563441| 1.94962e-05| 1.55%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/files.py:255 mode
(call)| 28900| 3.98227| 0.000137795| 10.96%|# /Users/rjgildea/software/cctbx_py3/conda_base/lib/python3.8/site-packages/h5py/_hl/dataset.py:580 __init__
295| 0| 0| 0| 0.00%| elif otype == h5i.DATATYPE:
296| 0| 0| 0| 0.00%| return datatype.Datatype(oid)
297| 0| 0| 0| 0.00%| else:
298| 0| 0| 0| 0.00%| raise TypeError("Unknown object type")
I've profiled nexus.py before too and had similar conclusions. Most of the time is spent reading all the metadata from the h5 file. There are so many leaf nodes to examine to read flags, strings, and constants. I've wondered if re-writing nexus.py in c++ would be faster, but it would be a big undertaking.
After those two merges, I see this now:
$ dials.python time_format_class.py $(dials.data get -q vmxi_thaumatin)/image_15799_master.h5
1.1481 seconds (0.0115 seconds per image)
vs
$ dials.python time_format_class.py $(dials.data get -q x4wide)/X4_wide_M1S4_2_0001.cbf
0.0257 seconds (0.0003 seconds per image)
A cbf format class for comparison:
This adds an overhead comparable in timescale to that required for the thresholding algorithm in spotfinding, and adds a not insignificant overhead to the per-image analysis run at DLS.
Note that in the context of per-image analysis the format class is currently constructed twice, which doubles the resulting overhead (see https://github.com/dials/dials/pull/1539).