GoogleCloudPlatform / vertex-ai-samples

Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI.
https://cloud.google.com/vertex-ai
Apache License 2.0
1.63k stars 805 forks source link

How to add header to AnthropicVertex Python request #3330

Open SomebodySysop opened 1 month ago

SomebodySysop commented 1 month ago

Expected Behavior

Allow max output tokens of 8192 when making calls to claude-sonnet-3.5 using AnthropicVertex SDK (Python)

Actual Behavior

Only 4196 output tokens allows.

Steps to Reproduce the Problem

From: https://docs.anthropic.com/en/docs/about-claude/models

Max output | 8192 tokens1 8192 output tokens is in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15. If the header is not specified, the limit is 4096 tokens.

Question: How do I add anthropic-beta: max-tokens-3-5-sonnet-2024-07-15 to header when using AnthropicVertex SDK?

Specifications

SomebodySysop commented 1 month ago

After extensive online searches, I did find two ways advertised to modify the headers to increase the maximum output tokens:

https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#default-headers

    client = AnthropicVertex(
        region=LOCATION,
        project_id=project_id,
        # https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#default-headers
        default_headers={"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"} # Custom headers here
    )

And also:

https://x.com/alexalbert__/status/1812921642143900036

    message = client.messages.create(
        max_tokens=max_tokens,
        messages=[
            {
                "role": "user",
                "content": content,
            }
        ],
        model="claude-3-5-sonnet@20240620",
        # https://x.com/alexalbert__/status/1812921642143900036
        extra_headers={"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"}  # Custom headers here
    )

But neither way appears to work in the AnthropicVertex SDK:

Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'max_tokens: 8192 > 4096, which is the maximum allowed number of output tokens for claude-3-5-sonnet-20240620'}}