ercius / openNCEM

A collection of packages and tools for electron microscopy data analysis supported by the National Center for Electron Microscopy facility of the Molecular Foundry
GNU General Public License v3.0
60 stars 28 forks source link

4D-STEM data reads incorrectly #17

Closed ercius closed 2 years ago

ercius commented 5 years ago

Newer versions of DM with the STEMX capability for 4D-STEM now write 4-dimensional DM4 files with a different shape than previously was done. This is actually an improvement in the layout of the data on disk, but ncempy.io.dm does not currently read this data correctly.

There is a workaround. You can use the low level API in ncemy.io.dm.fileDM to create a memmap on disk with the correct ordering:

d1 = nio.dm.fileDM('filename.dm4')
data = np.memmap(d1.fid,dtype=d1._DM2NPDataType(d1.dataType[1]), mode = 'r', offset=d1.dataOffset[1], shape=(d1.xSize[1],d1.ySize[1],d1.zSize[1],d1.zSize2[1])[::-1],order='C')
image0 = data[:,:,0,0] # the first diffraction pattern

Note how the shape keyword has [::-1] to reverse the order of the data shape. The shape is now [ky, kx, scanY, scanX] making the diffraction patterns the fastest data to access according to C-ordering.

I will attempt to add automatic identification of this change in the file format, but it will take some time to figure out.

sk1p commented 4 years ago

I also stumbled on this.

The shape is now [ky, kx, scanY, scanX] making the diffraction patterns the fastest data to access according to C-ordering.

Actually, it is the other way around: in numpy, the rightmost axis is the fastest to access. You can easily check this using the strides attribute on the array, or even by benchmarking:

>>> image0.strides  # image0 from your code snippet above, applied to some sample data
(143654400, 74820)

This means, between each pixel in image0, we need to jump 74820 bytes in the input data. In case of fast access, this should be a number corresponding to the dtype, for example 4 for float32 data.

>>> b = np.random.randn(128, 128, 128, 128)
>>> b.strides
(16777216, 131072, 1024, 8)

%timeit np.copy(b[..., 0, 0])  # on my system: 534 µs ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> b[..., 0, 0].strides
(16777216, 131072)

%timeit np.copy(b[0, 0, ...])  # on my system: 7.02 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> b[0, 0, ...].strides
(1024, 8)

Or, from your example above, compare the timing of np.copy(data[:, :, 0, 0]) with np.copy(data[0, 0, ...]). The np.copy is needed to actually evaluate the performance - without it, numpy just creates a view into the memory map, without accessing the underlying data.

So sadly, in this new format, while it is fast to access all scan positions for a single pixel of a diffraction pattern, accessing a whole diffraction pattern is very expensive.