We want to further reduce our dataset. Thus we calculate the 40 / 60 / x % quantile of positive cells per tile for each marker and filter out all tiles that have less than the x-quantile of positive cells.
Design overview
Calculate the x-quantile of positive cells for each marker
Filter out Examples from the training data that have less than x-quantile positive cells
Code mockup
Add class function quantile_filter to ModelBuilder
def quantile_filter(self):
num_pos_dict = {}
for example in self.training_dataset:
marker = example["marker"]
if marker not in num_pos_dict.keys():
num_pos_dict["marker"] = []
num_pos_dict["marker"].append([np.sum(example["activity_df"].labels==1))
quantile_dict = {}
for marker, pos_list in num_pos_dict.items():
quantile_dict[marker] = np.quantile(pos_list, self.filter_quantile)
predicate = lambda example: np.sum(example["activity_df"].labels==1) > quantile_dict[example["marker"]]
self.train_dataset = self.train_dataset.filter(predicate)
Required inputs
tf.data.TFRecordDataset containing the training data.
Output files
tf.data.TFRecordDataset containing the filtered training data.
Timeline
Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
[x] A couple days
[ ] A week
[ ] Multiple weeks. For large projects, make sure to agree on a plan that isn't just a single monster PR at the end.
Estimated date when a fully implemented version will be ready for review:
Estimated date when the finalized project will be merged in:
Relevant background
We want to further reduce our dataset. Thus we calculate the 40 / 60 / x % quantile of positive cells per tile for each marker and filter out all tiles that have less than the x-quantile of positive cells.
Design overview
Code mockup
Add class function
quantile_filter
toModelBuilder
Required inputs
tf.data.TFRecordDataset
containing the training data.Output files
tf.data.TFRecordDataset
containing the filtered training data.Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
Estimated date when a fully implemented version will be ready for review:
Estimated date when the finalized project will be merged in: