Open rbavery opened 2 years ago
We'll instead select a random sample on a scene basis to ensure the following number of samples for each class are represented:
I think we can then sort these scenes into folders like so
train/ class_folder_1 N folders for scene samples .... validation/ class_folder_1 10 folders for scene samples ... test/ class_folder_1 3 folders for scene samples ...
Completed in PR #80. The partitioned data is visible in gs://ceruleanml/partitions/
.
Now that the dataset is almost finalized, we can start breaking apart the train, val, and test sets. I think the best way to do this is in the dataloading step by defining a splitter function that works for the icevision and fastai2 trainers. This would involve defining lists of scene ids for our train set, validation sets, and test sets. From the Phase 2 Doc, these are the guidelines for how we should split these out:
So I'm thinking the sets of scene ids we need to define as list are as follows:
1: "Infrastructure", 2: "Natural Seep", 3: "Coincident Vessel", 4: "Recent Vessel", 5: "Old Vessel",
@jonaraphael do you want to select these scenes we will use to evaluate yourself? I recall that we wanted to select scenes that were annotated with particular detail and attention. If not just let me know and we can do this.