Fix validation and test split not being reproducible

sfmig commented 4 months ago

Why is this PR needed?

Currently when we create the test and validation splits we don't pass a generator. We do pass one when we create the training split.

This means that given a seed, the splitting of the dataset into train and test-val sets is reproducible, but the subsequent splitting of the test-val set into a test set and a val set is not.

What does this PR do?

This PR:

passes a new generator to the second split of the data
adds tests for the splits generation.

Smaller bits

adds short docstrings to existing datamodule tests.
adds a utility function that converts a tensor of bboxes into a COCO-style dictionary for a crab dataset.

Notes

I decided to pass a different generator for each call to random_split to try to make it a bit "future-proof". That way we guarantee the splits are repeatable even if some randomisation code is added in between the two calls to random_split.

codecov-commenter commented 4 months ago

Codecov Report

Attention: Patch coverage is 95.23810% with 1 line in your changes missing coverage. Please review.

Project coverage is 47.75%. Comparing base (a21d4f1) to head (c678df6).

Files with missing lines	Patch %	Lines
crabs/detector/datamodules.py	75.00%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #218 +/- ## ========================================== + Coverage 46.68% 47.75% +1.07% ========================================== Files 24 24 Lines 1476 1493 +17 ========================================== + Hits 689 713 +24 + Misses 787 780 -7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sfmig commented 4 months ago

Other option, can we just use seed_everything?

Cool, I didn't know about this!

I think for now I'd prefer to constraint the seeding to the dataset creation, because that is the part I need to be reproducible. But good to have this in the radar.

sfmig commented 4 weeks ago

thanks for the help Nik! 🌟

SainsburyWellcomeCentre / crabs-exploration