Open aimichal opened 4 years ago
Some thoughts:
There are two pods per model deployed for prod. This is nice for availability (if the workloads must be re-arranged among cluster nodes), surges in traffic, and smooth deployments. Having only one pod per model might hide the non-determinism, but would make it vulnerable to outages during cluster workload re-arrangements, and would making non-determinism appear after deployments, making it look like a regression. So I don't advise reducing replica count from 2 to 1.
The pods backing these servers happen to be up for 9 days each. Is that unusual? Maybe state accumulated in a bad way.
Maybe the random seed isn't initialized. If it isn't already, then could prepare_environment() help? https://github.com/allenai/allennlp/blob/87c23e4a7e25d81e91ba8234eef38b5395993c51/allennlp/common/util.py#L218-L247
While non-determinism may be unavoidable in the general case, for the demo we should find a way to make it consistent, so it doesn't look broken.
For unavoidably non-deterministic models, there should be a disclaimer on the demo page, inviting people to click "Run" several times to see get a sense of the variability.
There are two pods running the NAQANet model, and they each consistently return different results. For example, pod allennlp-demo-api-naqanet-prod-597f89dc7f-7bwhw returns with answer.value =
game, players direct a nasce...
, while pod allennlp-demo-api-naqanet-prod-597f89dc7f-hq96x returns answer.value =real-life orbital maneuvers
.As a result, querying the load-balanced endpoint https://demo.allennlp.org/api/naqanet/predict results in either of these two results being returned.