Closed beijbom closed 3 years ago
NOTE1: an earlier assessment of these changes used synthetic ImageFeatures
data. For that I got a full 3 orders of magnitude storage improvement. With real data, it's "only" 1 order of magnitude.
NOTE2: I had to reduce the precision of some legacy tests when using half precision.
@StephenChan @qiminchen : thoughts on this PR? Preference for using float16 or float32? @qiminchen : did you get a chance to try re-training the classifiers using this setting?
@StephenChan @qiminchen : thoughts on this PR? Preference for using float16 or float32? @qiminchen : did you get a chance to try re-training the classifiers using this setting?
changes look great, hmm I would vote for float32 even tho float16 does save memory and we don't see performance dropping, it has less precision and we don't want some "potential" precision issue in the future, also most of the models use float32 as default dtype.
I have a few meetings this week and presentations, so I will work on it today or tmr.
thoughts on this PR? Preference for using float16 or float32?
Changes look good to me, and I got the same size results on the test code in the first post: tmp.feats
was 271k with master, 24k with this PR's branch.
I think Qimin's reasoning on using float32 makes sense. And the extra x2 savings on storage doesn't seem like a big deal for CoralNet's traditional usage, at least. If I understand correctly, we're talking 240 KB savings for a 100-point image, when the image file itself can be 5-10 MB. If we have a dense point cloud though, then that could be another story.
@beijbom @StephenChan here are some comparisons between this branch and the master on training the classifier. The per Feature size
would be doubled if using float32
. The PR looks great, we can save a lot of storage and training time. There won't have any changes in extraction time tho.
NOTE: comparison shown as <master | this PR> | Source | Accuracy (%) | per Feature size | Training time (s) |
---|---|---|---|---|---|
s1498 | 78.4 | 79.0 | ~828kB | ~71kB | 33.14 | 5.91 | ||
s294 | 86.3 | 85.8 | ~5.5MB | ~470kB | 168.61 | 15.79 | ||
s1396 | 89.6 | 89.9 | ~277kB | ~24.5kB | 24.6 | 8.7 |
Thanks guys. I changed to float32, merged this and released 0.3.0
package. Also pushed a requirement bump to https://github.com/beijbom/coralnet/pull/323.
This PR writes a custom
.store()
and.load()
method forImageFeatures
.It reduces storage from 271k to 24k for the example below with 10 feature vectors from
efficientnet_b0_ver1
. Storage time to local changed from 0.01 to 0.003 seconds.Further, this changes to using
np.half
inside thePointFeatures
class so training on these features should be faster also. https://github.com/qiminchen/CoralNet/issues/13Question: I could settle for np.float (32 bit precision) at twice the storage cost. I'm not sure which is better.
To test
To test run e.g. the code below on master and on this branch and check the file-size on disk.