Closed TanGentleman closed 6 days ago
Here's a handy snippet from https://danielvanstrien.xyz/posts/2024/11/local-vision-language-model-lm-studio.html:
from typing import Annotated, List, Optional
from pydantic import BaseModel, Field
from pydantic.types import StringConstraints
class ScreenshotCategory(BaseModel):
category: Literal["meme", "documentation image", "other"] = Field(
..., description="The category of the screenshot"
)
class ScreenshotInformation(BaseModel):
description: Annotated[str, StringConstraints(min_length=50, max_length=1000)] = (
Field(
..., description="A short description of the screenshot (50-200 characters)"
)
)
category: ScreenshotCategory
tags: Optional[List[Annotated[str, StringConstraints(min_length=3)]]] = Field(
None, description="A list of tags that describe the screenshot", max_items=3
)
prompt = f"""Analyze the given screenshot and provide the following information in JSON format:
1. description: A short description of the screenshot (50-200 characters)
2. category: Categorize the screenshot as one of the following:
- meme
- documentation image
- other
3. tags: (Optional) Up to 3 tags that describe the screenshot
Ensure your response follows this schema:
{ScreenshotInformation.model_json_schema()}
Do not include any explanations or additional text outside of the JSON structure."""
Fully implemented
For ollama and openai compatible endpoints, use the syntax to force JSON output.
For LMStudio served models specifically, I can force a stronger JSON schema with defined validation logic.