By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
this example contains a Llama-2-13b chat deployed on an EKS cluster through Ray Serve. The rest predictor utilizes the rest ep url to generate responses.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
this example contains a Llama-2-13b chat deployed on an EKS cluster through Ray Serve. The rest predictor utilizes the rest ep url to generate responses.