cctbx / dxtbx

Diffraction Experiment Toolbox
BSD 3-Clause "New" or "Revised" License
2 stars 18 forks source link

get_raw_data should return data in native data type if possible #222

Open graeme-winter opened 3 years ago

graeme-winter commented 3 years ago

Proposal: get_raw_data() function should return data in as close to the native format as possible.

https://github.com/cctbx/cctbx_project/pull/533 enables this by defining common integer data types e.g. unsigned 16 bit integer etc. which are used for storing images - at the moment we unpack the data to large integers however if the raw data are 16 bit, then get_raw_data should ideally return this.

Motivation: 16 or 8 bit data are substantially smaller than 32 bit and will therefore require less memory to store, less time to move around memory. If there is no need to convert the data then saving conversion from e.g. uint16 to int32 will improve performance. In addition the true underlying data type will by definition also encode the bit depth of the image which will help towards resolving https://github.com/cctbx/dxtbx/issues/197

Before now this was impossible. With a full gamut of fixed width integer types it should be possible to represent any image type in a native form thus reducing byte manipulation. Clearly in some cases there will be no saving if the underlying data are 32 bit, but for standard TIFF images from CCD detectors and most Eiger data (which are often 16 bit, or possibly smaller) this could represent a considerable saving.

See also https://github.com/cctbx/dxtbx/issues/219 as a possible beneficiary of this change.

graeme-winter commented 3 years ago

Alternative thought on this topic - add a new method get_native_data which can default to pointing at get_raw_data or whatever however can be overloaded to e.g. return uin16_t data if that is the native format. If the caller wants to ignore data types, use get_raw_data; if the caller is prepared to accept other data types get_ntive_data for the win.

dagewa commented 3 years ago

I'm less keen on that version, it complicates the API. The caller could probably call e.g. as_double on the result of get_raw_data if they don't care about the storage type?

graeme-winter commented 3 years ago

I don't think you are wrong @dagewa however this does offer a nondestructive path to actually make this happen. The alternative route means we have to go check every piece of software which could be using this API to verify that they are OK with this.

Alternatively, we make the change & duck 🙂

I would however like to have an abstraction breaking API which actually gives me the bytes as they are represented in the data file, with the minimum of blitter operations.