Closed StephenChan closed 6 months ago
Resolved in PR #71. The batch size no longer cares about points per image, and in addition, it no longer cares about image 'boundaries' at all. So a single image's points can go into multiple batches. A generator function made this decently clean to code (IMO).
Note that the preprocess_labels()
train/ref/val split is still done on the image level though, not the point level. But I think that's acceptable because: it may make the sets simpler to reason about, the potential unevenness isn't nearly as bad as the original issue, and it's still possible to make your own train/ref/val split that has a single image in multiple sets.
Mini-batch size calculation:
samples_per_image is taken from the first image in the input:
So I tried a training job where the first image has 10 points, followed by a bunch of images with 1000 points (unheard of for a single CoralNet source, but possible if multiple sources are incorporated). Logs confirmed that it decided on 500 images per mini-batch. Then it got this:
This machine had close to 16 GB RAM, so maybe there were multiple similarly large arrays in memory at the time (which might indicate a bit of memory optimization to be done in the code). But the point is, the batch size is supposed to be part of the instance-requirements design. Also, I imagine that the intent of the training algorithm is to have somewhat uniform batch sizes, and having them vary so much may be considered an improper implementation with undefined behavior. So making the batch size consistent (in annotation count) regardless of variable points per image seems worthwhile, as long as it's not a big implementation chore.
I've actually been working on this, issue #59, and issue #60 together as they touch common parts of the code, but I wanted to document this behavior while I was at it.