google-gemini / generative-ai-python

The official Python library for the Google Gemini API
https://pypi.org/project/google-generativeai/
Apache License 2.0
1.19k stars 227 forks source link

response_schema parameter is not followed. #343

Closed Bikatr7 closed 1 month ago

Bikatr7 commented 1 month ago

Description of the bug:

response_schema parameter is not followed unless system_instruction also details the response_schema for gemini-1.5-pro family of models.

This could be intended behavior, but it seems like it could be a massive waste of tokens for more complicated schemas.

Actual vs expected behavior:

I'd expect the response_schema to be respected and followed regardless of system_instructions detailing them. If this is an incorrect assumption please let me know.

Any other information you'd like to share?

code to reproduce

## built-in libraries
import typing

## third party libraries
from google.generativeai import GenerationConfig
from google.generativeai.types import GenerateContentResponse, AsyncGenerateContentResponse
import google.generativeai as genai

## Dummy values from production code
_default_translation_instructions: str = "Translate this to German. Format the response as JSON parseable string."
_default_model: str = "gemini-1.5-pro-latest"

_system_message = _default_translation_instructions

_model: str = _default_model
_temperature: float = 0.5
_top_p: float = 0.9
_top_k: int = 40
_candidate_count: int = 1
_stream: bool = False
_stop_sequences: typing.List[str] | None = None
_max_output_tokens: int | None = None

_client: genai.GenerativeModel
_generation_config: GenerationConfig

_decorator_to_use: typing.Union[typing.Callable, None] = None

_safety_settings = [
    {
        "category": "HARM_CATEGORY_DANGEROUS",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_HATE_SPEECH",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
        "threshold": "BLOCK_NONE",
    },
]

## with open("gemini.txt", "r", encoding="utf-8") as f:
##      api_key = f.read().strip()
api_key = "YOUR_API_KEY"
genai.configure(api_key=api_key)

## Instructing the model to translate the input to German as JSON, without detailed schema
non_specific_client = genai.GenerativeModel(
    model_name=_model,
    safety_settings=_safety_settings,
    system_instruction="Translate this to German. Format the response as JSON parseable string."
)

## Instructing the model to translate the input to German as JSON, with detailed schema
_client = genai.GenerativeModel(
    model_name=_model,
    safety_settings=_safety_settings,
    system_instruction="Translate this to German. Format the response as JSON parseable string. It must have 2 keys, one for input titled input, and one called output, which is the translation."
)

_generation_config = GenerationConfig(
    candidate_count=_candidate_count,
    stop_sequences=_stop_sequences,
    max_output_tokens=_max_output_tokens,
    temperature=_temperature,
    top_p=_top_p,
    top_k=_top_k,
    response_mime_type="application/json",
    response_schema={
        "type": "object",
        "properties": {
            "input": {
                "type": "string",
                "description": "The original text that was translated."
            },
            "output": {
                "type": "string",
                "description": "The translated text."
            }
        },
        "required": ["input", "output"],
    }
)

## Inconsistent results, schema is not being followed
try:
    response = non_specific_client.generate_content(
        "Hello, world!", generation_config=_generation_config
    )
    print(response.text)
except Exception as e:
    print(f"Error with non-specific client: {e}")

## Consistent results, schema is being followed
try:
    response = _client.generate_content(
        "Hello, world!", generation_config=_generation_config
    )
    print(response.text)
except Exception as e:
    print(f"Error with specific client: {e}")

## Clarification question
## Is it intended behavior that the system instruction has to detail the schema? If so, what's the point of the response_schema parameter in the GenerationConfig class? It seems like a waste of tokens.
MarkDaoust commented 1 month ago

Hi, you're early!

We added SDK support for this feature thinking it was going to fully launch at I/O, but it's still a few days out, so we just withheld the documentation for it. Nice job finding it and figuring it out at all. What you have there should work, without the system instructions, soon. Maybe Monday?

Bikatr7 commented 1 month ago

Hi, you're early!

We added SDK support for this feature thinking it was going to fully launch at I/O, but it's still a few days out, so we just withheld the documentation for it. Nice job finding it and figuring it out at all. What you have there should work, without the system instructions, soon. Maybe Monday?

Alright, thanks for letting me know. Guess I was a bit too excited to utilize it.