Adding more control for validation and test dataset composition

Instructions

Add a class method to ModelBuilder that filters sets of FOV names from the tfrecord into the respective test and validation datasets.

Relevant background

Instead of taking the first x tiles as test and validation data, we want to get more control over the composition of the datasets by explicitly using lists of FOV names to construct them.

Design overview

Implement function ModelBuilder.filter_fovs(self, dataset, fov_list, positive_list), that takes a dataset, a list of FOVs and a boolean indicating if the fov_list is a positive or negative list, meaning if the respective fovs should be filtered out of the dataset or filtered into a dataset that gets returned. The function returns one dataset that only contains samples whose FOVs are in fov_list if positive_list=True and one dataset that contains all samples except the ones whose FOVs are in fov_list if positive_list=False.

Code mockup

def filter_fovs(self, dataset, fov_list, positive_list, fov_key="fov"):
    dataset_tmp = copy(dataset)
    if positive_list:
        def predicate(fov):
            return fov in fov_list
    if not positive_list:
        def predicate(fov):
            return fov not in fov_list

    dataset_tmp = dataset_tmp.filter(lambda example: tf.py_function(predicate, [example[fov_key]], tf.bool))
    return dataset_tmp

Required inputs .tfrecord dataset that holds fov_key as key and list fov_list that is stored as .json and will be loaded in the init.

Output files

Loaded tfrecord dataset that holds samples from the specified FOVs.

Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.

[x] A couple days
[ ] A week
[ ] Multiple weeks. For large projects, make sure to agree on a plan that isn't just a single monster PR at the end.

Estimated date when a fully implemented version will be ready for review:

Estimated date when the finalized project will be merged in:

angelolab / Nimbus

Adding more control for validation and test dataset composition #49

Instructions