Implement a new class with all basic functionality to prepare .tfrecords datasets without assuming a conversion_matrix as input.
Relevant background
The MSK dataset comes without segmentation masks, thus we needed to segment the data with mesmer. To assign positive/negative classes for a marker to the cell segments, we use their cell table to calculate x: the percentage of positive cells for a given marker and then assign marker positive to the brightest x% of cells based on our segmentation. Thus this class should work without conversion_matrix and get's a cell_table that contains a column $marker+"_gt" where marker positivity is stored
Design overview
This is really tricky. We have two choices:
The SegmentationTFRecords class we got at the moment will inherit from a new class and implement some more functions to cope with conversion_matrix based activity map generation.
We go the other way round and build a child class that inherits from SegmentationTFRecords and just needs to implement __init__ and get_marker_activity and a new class function called check_additional_inputs. All checks on input that assume a conversion_matrix will then be implemented in SegmentationTFRecords.check_additional_inputs while this function will be empty in the new child class.
The 1st option would be a bit cleaner but more code especially in the tests needs to be changed. The 2nd option only requires minimal code changes and would be considerably faster to implement. The general problem here is that in object oriented design you normally have the most abstract class as the root and everything that inherits from it will be more specialized and less abstract. SegmentationTFRecords would implement check_additional_inputs whereas it's empty in our new class, this means that the former is more specialized than the latter and thus SegmentationTFRecords should inherit from our new class. But since we will only use this code for the MSK dataset and don't run it very often, I'd still go with option 2. because it's faster to implement, even though it's not clean design from an OOD standpoint.
Code mockup
class SimpleTFRecords(SegmentationTFRecords):
def init():
init everything except conversion_matrix
def get_marker_activity(self, sample_name, marker):
# get marker activity from activity_df[$marker+"_gt"]
def check_additional_input():
pass
class SegmentationTFRecords:
def check_additional_input():
check conversion_matrix
Required inputs
All inputs from SegmentationTFRecords except conversion_matrix_path and in addition gt_suffix which specifies the suffix in column name $marker+gt_suffix used to load the GT
Output files
.tfrecord dataset
Timeline
Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
[x] A couple days
[ ] A week
[ ] Multiple weeks. For large projects, make sure to agree on a plan that isn't just a single monster PR at the end.
Estimated date when a fully implemented version will be ready for review: 12/16/2022
Estimated date when the finalized project will be merged in: 12/20/2022
Instructions
Implement a new class with all basic functionality to prepare .tfrecords datasets without assuming a
conversion_matrix
as input.Relevant background
The MSK dataset comes without segmentation masks, thus we needed to segment the data with mesmer. To assign positive/negative classes for a marker to the cell segments, we use their cell table to calculate x: the percentage of positive cells for a given marker and then assign marker positive to the brightest x% of cells based on our segmentation. Thus this class should work without
conversion_matrix
and get's a cell_table that contains a column $marker+"_gt" where marker positivity is storedDesign overview
This is really tricky. We have two choices:
SegmentationTFRecords
class we got at the moment will inherit from a new class and implement some more functions to cope with conversion_matrix based activity map generation.SegmentationTFRecords
and just needs to implement__init__
andget_marker_activity
and a new class function calledcheck_additional_inputs
. All checks on input that assume a conversion_matrix will then be implemented inSegmentationTFRecords.check_additional_inputs
while this function will be empty in the new child class.The 1st option would be a bit cleaner but more code especially in the tests needs to be changed. The 2nd option only requires minimal code changes and would be considerably faster to implement. The general problem here is that in object oriented design you normally have the most abstract class as the root and everything that inherits from it will be more specialized and less abstract.
SegmentationTFRecords
would implementcheck_additional_inputs
whereas it's empty in our new class, this means that the former is more specialized than the latter and thusSegmentationTFRecords
should inherit from our new class. But since we will only use this code for the MSK dataset and don't run it very often, I'd still go with option 2. because it's faster to implement, even though it's not clean design from an OOD standpoint.Code mockup
class SimpleTFRecords(SegmentationTFRecords): def init():
init everything except conversion_matrix
def check_additional_input(): pass
class SegmentationTFRecords:
def check_additional_input():
check conversion_matrix
Required inputs
All inputs from
SegmentationTFRecords
exceptconversion_matrix_path
and in additiongt_suffix
which specifies the suffix in column name $marker+gt_suffix used to load the GTOutput files
.tfrecord dataset
Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
Estimated date when a fully implemented version will be ready for review: 12/16/2022
Estimated date when the finalized project will be merged in: 12/20/2022