Switch model loading to improve latency

david-fisher / 320-F19-Track-I

Track I's group repo

BSD 3-Clause "New" or "Revised" License

3 stars 1 forks source link

Switch model loading to improve latency #89

Open dhruvilgala opened 4 years ago

dhruvilgala commented 4 years ago

Instead of loading models for every GET predict call, have the models already loaded on the EC2. The EC2 should wait for the GET call and then run inference on these preloaded models. This will bring latency down to <1 second.