ePSIC-DLS / epsic_tools

Code for conversion and analysis of Merlin-Medipix data
GNU General Public License v3.0
6 stars 7 forks source link

large data conversion fix #25

Closed M0hsend closed 4 years ago

M0hsend commented 4 years ago

The changes here are partly to address the issue #24 and working towards fixing issue #17 and also to clean up the conversion code. New functions defined in mib_dask_import.py: mib_to_daskarr(hdr_info, fp, mmap_mode='r'): Reads the binary mib file into a numpy memmap object and returns as dask array object get_hdr_bits(hdr_info): gets the number of character bits for the header for each frame given the data type mib_to_h5stack(fp, hdr_info, save_path, mmap_mode='r'): Reads a .mib file using memory mapping where the array is stored on disk and not directly loaded, but may be treated like a dask array. It writes the data in chunks using _stack_h5dump into an h5 file. _stack_h5dump(data, hdr_info, saving_path, raw_binary = False): Incremental reading of a large stack dask array object and saving it in a h5 file using _h5_chunk_write. _h5_chunk_write(data, saving_path): Incremental saving of the data into h5 file if the h5 file does not exists, creates it and if it does appends the data to the existing dataset h5 dataset key: 'data_stack' _untangle_raw(data, hdr_info, stack_size): Corrects for the tangled raw mib format - Only the case for quad chip is considered here. h5stack_to_hs(h5_path, hdr_info): this function reads the saved stack h5 file into a reshaped 4DSTEM hyerspy lazy object chunks are defined as (1000, det_x, det_y). This function assumes the mib file path corresponding to this h5 file is the path provided in the hdr_info TODO: Make this an argument to input the mib path if different

Change to mib_dask_reader(mib_filename, h5_stack_path = None): Now if a path for h5 stack is provided that stack is loaded and reshaped otherwise it just gets the mib file path and attempts to load to dask and reshape.

Change to mib2hdf_watch_convert.py: convert(beamline, year, visit, mib_to_convert, folder): if the scan array is large ( > 300*300) it saves the stack into a h5 file then reads it into a lazy hyperspy object. If scan array is smaller just loads directly as dask and saves outputs.

M0hsend commented 4 years ago

Merging this since it has been tested on multiple session data. Closing issues #17 and #24 - now fixed by this PR.