Closed dchaley closed 3 days ago
Results on local environment are very promising.
Before:
Loaded model in 12.3 s
Ran prediction in 1.25 s; success: True
After:
Loaded model in 0.84 s
Ran prediction in 1.19 s; success: True
This is the time to load the model from disk, not to fetch it from storage.
Once the PR is merged we can test on cloud.
The new model is uploaded:
gs://genomics-data-public-central1/cellular-segmentation/vanvalenlab/deep-cell/vanvalenlab-tf-model-multiplex-downloaded-20230706/MultiplexSegmentation-resaved-20240710.h5
Its md5 hash is: 56b0f246081fe6b730ca74eab8a37d60
gs://genomics-data-public-central1/cellular-segmentation/vanvalenlab/deep-cell/vanvalenlab-tf-model-multiplex-downloaded-20230706/MultiplexSegmentation-resaved-20240710-md5.txt
It was generated like so:
model = tf.keras.models.load_model("/Users/davidhaley/.keras/models/MultiplexSegmentation")
model.save("MultiplexSegmentation-resaved-20240710.h5")
It seems to be really that simple… but, see upcoming PR for caveats on loading the .h5
model.
Cloud results 🎉
Before:
Reading model from /root/.keras/models/MultiplexSegmentation.
Loaded model in 8.99 s
Ran prediction in 2.82 s; success: True
After:
Loading model from: /root/.keras/models/MultiplexSegmentation-resaved-20240710.h5
Loaded model in 2.68 s
Ran prediction in 2.79 s; success: True
Summary of results:
Environment | Before | After | Diff |
---|---|---|---|
Macbook M3 Max Pro | 12.3 s | 0.84 s | -11.46 s (-93%) |
n1-standard-8 w/ 1 T4 GPU | 8.99 s | 2.68 s | -6.31 s (-70%) |
n1-standard-32 w/ 1 T4 GPU | 8.21s | 2.72 s | -5.49 s (-67%) |
Of note, loading the model into memory used to take ~3x the time of predicting the 512x512 image. Now it's roughly the same.
According to the docs there's a more recent / "efficient" persistence method:
.keras
.We're currently using the
SavedModel
format. (from DeepCell / Van Valen Lab) It takes ~8s to load. 😔From this post, HDF5 is significantly faster to load:
If keras doesn't end up being faster, consider HDF5…?
also consider TensorFlow lite? https://www.tensorflow.org/lite/guide (Apparently, if we use compatible operations only, it might "just work" ??)