angelolab / mibi-bin-tools

Tools for extracting TIFF images from mibiscope bin files
MIT License
0 stars 3 forks source link

Add replace argument to extract_bin_files #37

Closed camisowers closed 2 years ago

camisowers commented 2 years ago

This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue

Section 1: Design details

Relevant background

Addresses #29.

Currently, when specifying channels to generate intensity images for, the remaining channels have blank intensity tifs saved.

Design overview

We would like to add a replace=True default argument to extract_bin_files() to combine the separate pulse and intensity dimensions into a single column to write images for. In addition, the area (intensity x width) image data will be removed from both extract_bin_files() and _write_out().

Cases:

  1. intensities are extracted and replace the pulse data for specific channels
  2. intensities are extracted and have their own dimension
  3. intensities are not extracted

For cases 1 and 3, the img_data will only have one dimension for the type. For case 2, the implementation does not change (and so the blank tif issue still needs to be addressed).

Design list/flowchart

def extract_bin_files():
    for i, (fov, bf) in enumerate(bin_files):
        img_data = _extract_bin.c_extract_bin(
            bytes(bf, 'utf-8'), fov['lower_tof_range'],
            fov['upper_tof_range'], np.array(fov['calc_intensity'], dtype=np.uint8)
        )

### new implementation

        if type_utils.any_true(intensities) and replace:
            if type(intensities) is not list:
                intensities = fov['targets']
            for j, target in enumerate(intensities):
                img_data[0, :, :, j] = img_data[1, :, :, j]
            img_data = img_data[0, :, :, :]
        elif not intensities:
            img_data = img_data[0, :, :, :]

###

        if out_dir is not None:
            _write_out(img_data, out_dir, fov['bin'][:-4], fov['targets'], intensities)
        else:

### adjustment for return data
            if replace or not intensities: 
                type_list = ['pulse']
            else: 
                type_list = ['pulse', 'intensities']

            image_data.append(
                xr.DataArray(
                    data=img_data[np.newaxis, :],
                    coords=[
                        [fov['bin'].split('.')[0]],
                        type_list,
                        np.arange(img_data.shape[1]),
                        np.arange(img_data.shape[2]),
                        list(fov['targets']),
                    ],
                    dims=['fov', 'type', 'x', 'y', 'channel'],
                    )   
                 )

The _write_out() function will need to be adjusted as well. We'll pass the intensities argument as either a bool or list as defined in extract_bin() (rather than passing any_true(intensities)) which will specify which intensity tifs to save if needed.

def _write_out():
    out_dirs = [
        os.path.join(out_dir, fov_name),
        os.path.join(out_dir, fov_name, 'intensities'),
    ]
    suffixes = [
        '',
        '_intensity',
    ]
    save_dtypes = [
        np.uint32,
        np.uint32,
    ]

    for i, (out_dir_i, suffix, save_dtype) in enumerate(zip(out_dirs, suffixes, save_dtypes)):

### rather than relying on `any_true(intensities)` bool a before, will depend on shape of (collapsed or non-collapsed) img_data to know whether to continue loop
        if i > img_data.shape[0]:
            break
        if not os.path.exists(out_dir_i):
            os.makedirs(out_dir_i)        
        for j, target in enumerate(targets):

### should solve blank intensity tif issue by only saving images for channels specified
            if i == 0 or target in intensities:
                io.imsave(
                    os.path.join(out_dir_i, f'{target}{suffix}.tiff'),
                    img_data[i, :, :, j].astype(save_dtype),
                    plugin='tifffile',
                    check_contrast=False
                )           

There are multiple subdirectories created to separately hold the pulse, intensity, and area images, but with replace=True we will simply keep all extracted images in a single directory. So _write_out() will only create an intensities subdirectory if replace is false.

Section 2: Implementation details

Once you have completed section 1, please tag the relevant parties and iterate on the initial design details until everyone is satisfied. Then, proceed to section 2

Control flow

Provide additional, more granular details (if necessary) about how the proposed coding logic will be put together

Milestones and timeline

List each of the major components of the project, and provide an estimated completion date for each one.

camisowers commented 2 years ago

@ngreenwald @ackagel

ngreenwald commented 2 years ago

Looks good!