Helm chart - copy models from NFS storage to attached storage

Proposed changes

The AWS and GCP default configurations configure the Engine Pods to read models from shared network attached storage. This can increase the disk latency and possibly increase the latency of requests that require a model load, which is particularly sensitive for streaming requests.

There should be an option to copy models from the NFS onto the Pod's attached host storage to reduce read latency for models. This could be done once on startup, and possibly poll for updated models in NFS.

deepgram / self-hosted-resources

Helm chart - copy models from NFS storage to attached storage #10

Proposed changes