angelolab / Nimbus

Other
12 stars 1 forks source link

Simple version of SegmentationTFRecords that expects image & label instead of conversion matrix #41

Closed JLrumberger closed 1 year ago

JLrumberger commented 1 year ago

Instructions

Implement a new class with all basic functionality to prepare .tfrecords datasets without assuming a conversion_matrix as input.

Relevant background

The MSK dataset comes without segmentation masks, thus we needed to segment the data with mesmer. To assign positive/negative classes for a marker to the cell segments, we use their cell table to calculate x: the percentage of positive cells for a given marker and then assign marker positive to the brightest x% of cells based on our segmentation. Thus this class should work without conversion_matrix and get's a cell_table that contains a column $marker+"_gt" where marker positivity is stored

Design overview

This is really tricky. We have two choices:

  1. The SegmentationTFRecords class we got at the moment will inherit from a new class and implement some more functions to cope with conversion_matrix based activity map generation.
  2. We go the other way round and build a child class that inherits from SegmentationTFRecords and just needs to implement __init__ and get_marker_activity and a new class function called check_additional_inputs. All checks on input that assume a conversion_matrix will then be implemented in SegmentationTFRecords.check_additional_inputs while this function will be empty in the new child class.

The 1st option would be a bit cleaner but more code especially in the tests needs to be changed. The 2nd option only requires minimal code changes and would be considerably faster to implement. The general problem here is that in object oriented design you normally have the most abstract class as the root and everything that inherits from it will be more specialized and less abstract. SegmentationTFRecords would implement check_additional_inputs whereas it's empty in our new class, this means that the former is more specialized than the latter and thus SegmentationTFRecords should inherit from our new class. But since we will only use this code for the MSK dataset and don't run it very often, I'd still go with option 2. because it's faster to implement, even though it's not clean design from an OOD standpoint.

Code mockup

class SimpleTFRecords(SegmentationTFRecords): def init():

init everything except conversion_matrix

def get_marker_activity(self, sample_name, marker):
    # get marker activity from activity_df[$marker+"_gt"]

def check_additional_input(): pass

class SegmentationTFRecords:

def check_additional_input():

check conversion_matrix

Required inputs

All inputs from SegmentationTFRecords except conversion_matrix_path and in addition gt_suffix which specifies the suffix in column name $marker+gt_suffix used to load the GT

Output files

.tfrecord dataset

Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.

Estimated date when a fully implemented version will be ready for review: 12/16/2022

Estimated date when the finalized project will be merged in: 12/20/2022