angelolab / Nimbus

Other
12 stars 1 forks source link

Adding more control and train/val/test set composition #50

Closed JLrumberger closed 1 year ago

JLrumberger commented 1 year ago

What is the purpose of this PR?

This PR closes #49 and adds function ModelBuilder.fov_filter to filter datasets based on lists that contain the fov names. The actual train/test/validation splits are saved individually for each dataset in e.g. configs/tonic_split.json and loaded in ModelBuilder.prep_data.

How did you implement your changes If we supply params["data_splits"] = ["path_to_/dataset_split.json"] to ModelBuilder or PromixNaive objects, calling class method prep_data loads the .json files into dicts with three keys each (train, validation and test). The associated values are lists of FOVs that should be used for train / validation / test datasets. To do this, class function ModelBuilder.fov_filter(self, dataset, fov_list, fov_key) is called to construct the train / validation and test datasets. ModelBuilder.fov_filter simply takes fov_list, constructs a predicate and maps it onto the full dataset to filter out all samples showing FOVs that are not in fov_list.

Remaining issues

None