Closed StephenChan closed 6 months ago
Oh yeah, one other note: there was a previous version of this code where I made POINTS_STRATIFIED (or equivalent) the default mode. However, I've since changed the default mode to VECTORS.
Not sure where the best place for this commentary is - maybe an issue? I did some digging today into other methods for our imbalance class problem. One other option is 'weighing' the classes which you can do with other methods like SVM. However, sklearn has an issue that opened in 2017 which is one of the most highly upvoted issues for weighing classes using MLP. See Discussion : https://github.com/scikit-learn/scikit-learn/issues/9113 Most notably :
Hi all,
Thank you all for your comments.
The maintainers of scikit-learn have limited time and resources to improve the project and already are focusing on other aspects of the project they find valuable.
MLPs were introduced in scikit-learn but aren't currently a priority to the maintainers (the maintainers of scikit-learn aren't thinking of extending scikit-learn's implementations of MLPs anymore).
Now, this does not stop anyone from extending those implementations but we (or at least I) do not guarantee those contributions will be accepted.
Note that if someone is interested in co-maintaining those implementations, we highly welcome them!
Alternatively, specialized libraries like Keras and PyTorch should provide reference implementations.
Yeah, another issue for it - just created issue #98.
Issue #74 also has notes about sklearn's MLP being a bit rudimentary compared to other libraries' implementations. Super robust deep learning implementations seem to be out of sklearn's scope basically.
How does this PR look otherwise?
@yeelauren Thanks for the review! Tried making some edits accordingly.
Great! Thanks @StephenChan. One statistic we're missing here is the per class accuracy. Overall accuracy can hide some of the nuance between classes - opened an issue #99 that should help with this.
Newer version of PR #84. The difference is that this PR has gone on top of the merges of 95 and 96, resolving any conflicts. This is a new PR because I wanted to leave the old
training-annotation-sampling
branch intact, in case it's still being used for some tests right this moment.This PR is ready for review, and is the next thing I'm looking to finally merge.
Per the updated CHANGELOG:
And here are said Enum's comments:
The mode that's notably 'missing' is
VECTORS_STRATIFIED
, because it would be more complicated to stratify accurately when splitting at the vector level. As @yeelauren pointed out in the old PR's thread, there should be ways to implement that if desired, such as the imbalanced-learn library. But it would be more complex to implement than the other modes, so it's deferred until someone really wants it.There may be other methods/restrictions that one might want for the data split. For example, perhaps you have a hierarchy of CoralNet data where the data can be divided into several sources, each source has a set of feature vectors, and each feature vector has a set of point features; and you want each source to go entirely in train, or ref, or val (not split between the three). However, at this point, I think that potential need is covered by the ability to instantiate your own
TrainingTaskLabels
and thus define your own arbitrary split.Results of experiments using this code:
CSV version for potentially easier viewing: 2024-03 - single source runs with new sampling code.csv
(To be exact, the experiments used the
training-cache-features-3
branch, which places the PR #80 feature-caching commits on top of this PR's branch).My takeaways from the experiments:
The default train/ref/val ratios of 80%/10%/10% (with ref being capped at 50000) are working correctly. Small discrepancies can be explained by 1) restrictions of the VECTORS mode, and 2) filtering out of classes that don't end up in both train and ref.
Classifiers' measured accuracy hasn't been conclusively affected one way or the other by this PR, compared to the accuracy of pre-existing CoralNet classifiers trained on the same sources. That is, "Accuracy" is comparable to "CN accuracy". There are bigger accuracy fluctuations for smaller sources, as I'd expect.
POINTS_STRATIFIED consistently results in bigger labelsets ("Classes" column) than VECTORS, as I'd expect, since with POINTS_STRATIFIED, rare classes have a better guarantee of being included in both train and ref.
And since POINTS_STRATIFIED includes more classes than VECTORS, it also includes slightly more annotations.
It seems that if there's a discrepancy in accuracy between VECTORS and POINTS_STRATIFIED, larger discrepancies tend to give the nod to POINTS_STRATIFIED, despite the fact that we might expect bigger labelsets to result in lower accuracy (harder to predict correctly when there are more choices). So, for smaller sources in particular, there could indeed be a concern that POINTS_STRATIFIED makes the calibration and validation 'artificially good' by training on the same images that it's validating on. For larger sources, it doesn't seem as much of a concern. (cc @kriegman)