Closed amitkalay closed 1 month ago
currently the request is going in successfully, but the model is unable to recognize celebrities in the photo
Those skills all seem very similar in structure, with few things changing apart from the prompt template. Is there a way we can factor the common parts?
This PR introduces a chat completion prompt to do some image-processing and return the model's top response to be used for further processing. This code is called when we hit our base_url/api/summarize endpoint and pass in the appropriate header for image captioning. A concrete example is shown in the included .http file. When I called this new endpoint to tell me about the parts required to assemble a Tesla car (passed in as a base64 encoded image), I got the following response: