Add replace argument to extract_bin_files

This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue

Section 1: Design details

Relevant background

Addresses #29.

Currently, when specifying channels to generate intensity images for, the remaining channels have blank intensity tifs saved.

Design overview

We would like to add a replace=True default argument to extract_bin_files() to combine the separate pulse and intensity dimensions into a single column to write images for. In addition, the area (intensity x width) image data will be removed from both extract_bin_files() and _write_out().

Cases:

intensities are extracted and replace the pulse data for specific channels
intensities are extracted and have their own dimension
intensities are not extracted

For cases 1 and 3, the img_data will only have one dimension for the type. For case 2, the implementation does not change (and so the blank tif issue still needs to be addressed).

Design list/flowchart

def extract_bin_files():
    for i, (fov, bf) in enumerate(bin_files):
        img_data = _extract_bin.c_extract_bin(
            bytes(bf, 'utf-8'), fov['lower_tof_range'],
            fov['upper_tof_range'], np.array(fov['calc_intensity'], dtype=np.uint8)
        )

### new implementation

        if type_utils.any_true(intensities) and replace:
            if type(intensities) is not list:
                intensities = fov['targets']
            for j, target in enumerate(intensities):
                img_data[0, :, :, j] = img_data[1, :, :, j]
            img_data = img_data[0, :, :, :]
        elif not intensities:
            img_data = img_data[0, :, :, :]

###

        if out_dir is not None:
            _write_out(img_data, out_dir, fov['bin'][:-4], fov['targets'], intensities)
        else:

### adjustment for return data
            if replace or not intensities: 
                type_list = ['pulse']
            else: 
                type_list = ['pulse', 'intensities']

            image_data.append(
                xr.DataArray(
                    data=img_data[np.newaxis, :],
                    coords=[
                        [fov['bin'].split('.')[0]],
                        type_list,
                        np.arange(img_data.shape[1]),
                        np.arange(img_data.shape[2]),
                        list(fov['targets']),
                    ],
                    dims=['fov', 'type', 'x', 'y', 'channel'],
                    )   
                 )

The _write_out() function will need to be adjusted as well. We'll pass the intensities argument as either a bool or list as defined in extract_bin() (rather than passing any_true(intensities)) which will specify which intensity tifs to save if needed.

def _write_out():
    out_dirs = [
        os.path.join(out_dir, fov_name),
        os.path.join(out_dir, fov_name, 'intensities'),
    ]
    suffixes = [
        '',
        '_intensity',
    ]
    save_dtypes = [
        np.uint32,
        np.uint32,
    ]

    for i, (out_dir_i, suffix, save_dtype) in enumerate(zip(out_dirs, suffixes, save_dtypes)):

### rather than relying on `any_true(intensities)` bool a before, will depend on shape of (collapsed or non-collapsed) img_data to know whether to continue loop
        if i > img_data.shape[0]:
            break
        if not os.path.exists(out_dir_i):
            os.makedirs(out_dir_i)        
        for j, target in enumerate(targets):

### should solve blank intensity tif issue by only saving images for channels specified
            if i == 0 or target in intensities:
                io.imsave(
                    os.path.join(out_dir_i, f'{target}{suffix}.tiff'),
                    img_data[i, :, :, j].astype(save_dtype),
                    plugin='tifffile',
                    check_contrast=False
                )

There are multiple subdirectories created to separately hold the pulse, intensity, and area images, but with replace=True we will simply keep all extracted images in a single directory. So _write_out() will only create an intensities subdirectory if replace is false.

Section 2: Implementation details

Once you have completed section 1, please tag the relevant parties and iterate on the initial design details until everyone is satisfied. Then, proceed to section 2

Control flow

Provide additional, more granular details (if necessary) about how the proposed coding logic will be put together

Milestones and timeline

List each of the major components of the project, and provide an estimated completion date for each one.

angelolab / mibi-bin-tools

Add replace argument to extract_bin_files #37