Closed JLrumberger closed 1 year ago
For datasets with a large number of distinct images, this seems perfect. For some of the datasets with larger images, but a smaller number, we may want to do balanced selection, instead of random. For example, making sure that for rare markers, we have good enough representation in the val/test datasets. We can cross that bridge when we get to it, this will definitely work for TONIC & MSK
Instructions
Add a class method to
ModelBuilder
that filters sets of FOV names from thetfrecord
into the respective test and validation datasets.Relevant background
Instead of taking the first x tiles as test and validation data, we want to get more control over the composition of the datasets by explicitly using lists of FOV names to construct them.
Design overview
Implement function
ModelBuilder.filter_fovs(self, dataset, fov_list, positive_list)
, that takes a dataset, a list of FOVs and a boolean indicating if the fov_list is a positive or negative list, meaning if the respective fovs should be filtered out of the dataset or filtered into a dataset that gets returned. The function returns one dataset that only contains samples whose FOVs are infov_list
ifpositive_list=True
and one dataset that contains all samples except the ones whose FOVs are infov_list
ifpositive_list=False
.Code mockup
Required inputs .tfrecord dataset that holds
fov_key
as key and listfov_list
that is stored as .json and will be loaded in the init.Output files
Loaded tfrecord dataset that holds samples from the specified FOVs.
Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
Estimated date when a fully implemented version will be ready for review:
Estimated date when the finalized project will be merged in: