Single-model inference performance was degraded in #42. P99 latency increased to ~12.8 seconds with these changes.
The functionality to validate the model directory and validate the user module against the contract was duplicated every time an inference request was handled, contributing to the performance regression.
Description of changes:
This duplicated functionality has been removed.
Testing done:
Ran single-model inference benchmarking with the duplicated code removed and observed P99 latency reduced back to ~3.4 seconds.
Merge Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.
Issue:
Description of changes:
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.