abi / screenshot-to-code

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
https://screenshottocode.com
MIT License
63.81k stars 7.8k forks source link

Gemini 1.5 support #349

Open naman1608 opened 6 months ago

naman1608 commented 6 months ago

@abi even though it is working, the code being generated by the model is not for the image being uploaded, not sure if the prompt being sent is wrong or it is a model issue, how to check that?

abi commented 6 months ago

I have a helper function pprompt that lets you print the prompt nicely. My guess is that the image isn’t being sent correctly and that’s why it’s producing a random web page. I’ll take a look at your code to debug.

naman1608 commented 6 months ago

I used that, but only able to check the prompt in the OpenAI format, as the pprint function takes input in that format, after converting it to the format taken by Gemini is the issue I guess

msamylea commented 5 months ago

I rewrote the Gemini portion of this and it's working now for me. The issue was, Gemini expects the image file to be uploaded then referenced.

This is working for me:

sync def stream_gemini_response( messages: List[ChatCompletionMessageParam], api_key: str, callback: Callable[[str], Awaitable[None]], ) -> str:

model = genai.GenerativeModel("gemini-1.5-flash-latest")
genai.configure(api_key=api_key)

gemini_messages = []

for message in messages:
    if isinstance(message["content"], str):
        gemini_messages.append(message["content"])
    elif isinstance(message["content"], list):
        for content in message["content"]:
            if content["type"] == "text":
                gemini_messages.append(content["text"])
            elif content["type"] == "image_url":
                image_url = content["image_url"]["url"]
                image_data = base64.b64decode(image_url.split(",")[1])
                image = Image.open(io.BytesIO(image_data))
                gemini_messages.append(image)

try:
    response = model.generate_content(gemini_messages, stream=True)
    for chunk in response:
        if chunk.text:
            await callback(chunk.text)

    full_response = "".join(chunk.text for chunk in response)
    return full_response
except Exception as e:
    print(f"An error occurred: {str(e)}")
    return ""