Closed StephenChan closed 6 months ago
Source | Images | Points/im | Epochs | Time for epoch 1 | Time for each subsequent epoch |
---|---|---|---|---|---|
1097 | 1686 | 10 | 3 | 96s | 3s |
3354 | 1600 | 50 | 10 | 178s | 8s |
295 | 63263 | 10 | 10 | 66m41s | 2m22s |
The code may be ugly so far, but the results sure aren't... (remember, each subsequent epoch used to be as long as epoch 1)
Ready for review. Recent changes:
Still want to refactor adjacent classes/methods for better OOP design, but not urgent and will leave that for another time.
Cache feature vectors in the local filesystem if they were loaded from remote storage (S3 or URL).
I think this will do what it intends to, but I couldn't figure out a way to do it without committing OOP crimes somewhere (
data_classes.ImageFeatures.load()
in this case). I think the better way to do this would involve passing aStorage
to TrainClassifierMsg instead of aDataLocation
, and putting the temporary directory attribute onto that Storage. That probably involves a much larger refactor overall where the designs ofStorage
,DataLocation
, and perhapsDataClass
are reworked. I have ideas for that but I don't think I'm up for that for the remainder of the month.Also want to make the caching optional (but on by default), so that it can be turned off in case filesystem space is a concern, as it may be for coralnet's largest sources (particularly since older sources have feature vectors about 8x in filesize).
Let me know if it'd be useful to merge this PR in the short term though.