Closed cryptoapebot closed 2 months ago
Hi @cryptoapebot here you have a successful example of vision+stream with that model. Two versions:
This code runs using the simple-openai library:
package io.github.sashirestela.openai.playground;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.List;
import io.github.sashirestela.openai.OpenAI;
import io.github.sashirestela.openai.SimpleOpenAI;
import io.github.sashirestela.openai.domain.chat.ChatRequest;
import io.github.sashirestela.openai.domain.chat.content.ContentPartImage;
import io.github.sashirestela.openai.domain.chat.content.ContentPartText;
import io.github.sashirestela.openai.domain.chat.content.ImageUrl;
import io.github.sashirestela.openai.domain.chat.message.ChatMsgUser;
public class DemoVision {
private SimpleOpenAI openai;
private OpenAI.ChatCompletions chatService;
public DemoVision() {
openai = SimpleOpenAI.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.build();
chatService = openai.chatCompletions();
}
public void demoCallChatWithVisionExternalImage() {
var chatRequest = ChatRequest.builder()
.model("gpt-4-turbo-2024-04-09")
.messages(List.of(
new ChatMsgUser(List.of(
new ContentPartText(
"What do you see in the image? Give in details in no more than 100 words."),
new ContentPartImage(new ImageUrl(
"https://upload.wikimedia.org/wikipedia/commons/e/eb/Machu_Picchu%2C_Peru.jpg"))))))
.temperature(0.0)
.maxTokens(500)
.build();
var chatResponse = chatService.createStream(chatRequest).join();
chatResponse.filter(chatResp -> chatResp.firstContent() != null)
.map(chatResp -> chatResp.firstContent())
.forEach(System.out::print);
System.out.println();
}
public void demoCallChatWithVisionLocalImage() {
var chatRequest = ChatRequest.builder()
.model("gpt-4-turbo-2024-04-09")
.messages(List.of(
new ChatMsgUser(List.of(
new ContentPartText(
"What do you see in the image? Give in details in no more than 100 words."),
new ContentPartImage(loadImageAsBase64("src/main/resources/machupicchu.jpg"))))))
.temperature(0.0)
.maxTokens(500)
.build();
var chatResponse = chatService.createStream(chatRequest).join();
chatResponse.filter(chatResp -> chatResp.firstContent() != null)
.map(chatResp -> chatResp.firstContent())
.forEach(System.out::print);
System.out.println();
}
private ImageUrl loadImageAsBase64(String imagePath) {
try {
Path path = Paths.get(imagePath);
byte[] imageBytes = Files.readAllBytes(path);
String base64String = Base64.getEncoder().encodeToString(imageBytes);
var extension = imagePath.substring(imagePath.lastIndexOf('.') + 1);
var prefix = "data:image/" + extension + ";base64,";
return new ImageUrl(prefix + base64String);
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
public static void main(String[] args) {
var demoVision = new DemoVision();
demoVision.demoCallChatWithVisionExternalImage();
demoVision.demoCallChatWithVisionLocalImage();
}
}
@cryptoapebot To extend my answer, to generate images you should use the models dall-e-2 and dall-e-3 only. The vision feature (read images and describe them) is attached to the chat completion service and you should use one of the gpt models, including the gpt-4-turbo-2024-04-09. You can take a look at this OpenAI model endpoint compatibility table:
https://platform.openai.com/docs/models/model-endpoint-compatibility
[openai4j]( https://github.com/Lambdua/openai4j )It is a fork in this library that already supports gpt4 vision
final List<ChatMessage> messages = an ArrayList<>();
final ChatMessage systemMessage = a SystemMessage("You are a helpful assistant.");
//Here, the imageMessage is intended for image recognition
final ChatMessage imageMessage = UserMessage.buildImageMessage("What's in this image?",
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg");
messages.add(systemMessage);
messages.add(imageMessage);
ChatCompletionRequest chatCompletionRequest = ChatCompletionRequest.builder()
.model("gpt-4-turbo")
.messages(messages)
.n(1)
.maxTokens 200)
.build();
ChatCompletionChoice choice = service.createChatCompletion(chatCompletionRequest).getChoices().get(0);
System.out.println(choice.getText());
Thank you!
This isn't an issue so much as just a question. Can I use GPT-4-Turbo-2024-04-09 as the model in the /images endpoint?
OpenAI states that the new GPT-4 + Vision models to get images. https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4
Does anyone have an example doing that? Handling the return?