drivendataorg / zamba

A Python package for identifying 42 kinds of animals, training custom models, and estimating distance from camera trap videos
https://zamba.drivendata.org/docs/stable/
MIT License
118 stars 27 forks source link

Cached model does not get used in next run #189

Closed ejm714 closed 2 years ago

ejm714 commented 2 years ago

Model caching does not prevent re-downloading of model each time. For example,

at this point, the cache directory looks like

└── drivendata-public-assets
    └── zamba_official_models
        └── time_distributed_7f74686b7b.ckpt

The next time the user runs the model, the weights will get re-downloaded because this check https://github.com/drivendataorg/zamba/blob/7986c417f33839c0a8d14ac66201472acbfb393a/zamba/models/model_manager.py#L82 will look for the model at /my_cache/time_distributed_7f74686b7b.ckpt

since the checkpoint lookup is based just on the filename:

https://github.com/drivendataorg/zamba/blob/7986c417f33839c0a8d14ac66201472acbfb393a/zamba/models/utils.py#L38-L45

https://github.com/drivendataorg/zamba/blob/7986c417f33839c0a8d14ac66201472acbfb393a/zamba/models/official_models/time_distributed/config.yaml#L48

Recommended fix: