getomni-ai / zerox

PDF to Markdown with vision models
https://getomni.ai/ocr-demo
MIT License
5.85k stars 309 forks source link

Add Azure OpenAI support #13

Closed kobotschick closed 1 month ago

kobotschick commented 2 months ago

I would be great if the package supports Azure OpenAI models

pradhyumna85 commented 2 months ago

@wizenheimer, @tylermaran, this looks a very useful utility, any plans on this? I am happy to contribute on python sdk.

I think we should start with modifying the code to use openai-python sdk so that instead of passing openai key to the zerox constructor, we can pass the relevant client (OpenAI or AzureOpenAI) from openai python sdk which would replace the existing manual api calls.

Let me know, what you think.

wizenheimer commented 2 months ago

Hey @pradhyumna85

I think we should start with modifying the code to use openai-python sdk so that instead of passing openai key to the zerox constructor

Having an AsyncOpenAI based client and supporting Batch API would be a great addition, imo.

import os
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model="gpt-3.5-turbo",
    )

asyncio.run(main())

we can pass the relevant client (OpenAI or AzureOpenAI) from openai python sdk which would replace the existing manual api calls.

Great, might need to introduce a provider component to make LLDs simpler. There could be implications on the package's external api interface.

tylermaran commented 2 months ago

Agreed, I'd like to add Azure model support here.

Do you think the best approach is just adding the OpenAI SDK for both packages? I think the GPT models are clearly the right choice right now for now, but I wouldn't be surprised if anthropic or gemini models ended up giving similar performance over the next few months.

Wondering if we might want to have the models a bit more abstracted. In general I think adding the OpenAI sdk is a good starting point.

wizenheimer commented 2 months ago

Agreed. Here's a v0 draft of how we could shape it structurally. Need a couple of iteration to get this method signatures right.

Approach 1: Reference Code

classDiagram
    class LLMInterface {
        <<abstract>>
        +run(prompt: Dict, temperature: float, max_tokens: int, image: str) str
    }

    class LLMFactory {
        -model: str
        -client: Any
        +__init__(model: str)
        +run(prompt: Dict, temperature: float, max_tokens: int, image: str) str
        -_llm_response(prompt: Dict, temperature: float, max_tokens: int, image: str) str
    }

    LLMInterface <|-- LLMFactory

    class OpenAI
    class Anthropic
    class GoogleGenAI
    class CohereClient
    class AzureChatOpenAI
    class Bedrock

    LLMFactory --> OpenAI : uses
    LLMFactory --> Anthropic : uses
    LLMFactory --> GoogleGenAI : uses
    LLMFactory --> CohereClient : uses
    LLMFactory --> AzureChatOpenAI : uses
    LLMFactory --> Bedrock : uses

    note for LLMFactory "Supports multiple LLM providers:\n- OpenAI (GPT models)\n- Anthropic (Claude models)\n- Google (Gemini models)\n- Cohere\n- Azure OpenAI\n- AWS Bedrock\n- VLLM endpoints\n\nNow with optional image input\nfor multimodal models"

Approach 2: Reference Code

classDiagram
    class ABC
    <<interface>> ABC

    class AbstractLlmService {
        <<abstract>>
        +embeddings(text: str) list
        +chat_completion(messages, model, **kwargs) str
        +chat_completion_json(messages, model, **kwargs) str
        +json_completion(messages, model, **kwargs)
        +image_analysis(image: str, prompt: str, model, **kwargs) str
        +multimodal_completion(images: List[str], prompt: str, model, **kwargs) str
    }

    ABC <|-- AbstractLlmService

    note for AbstractLlmService "Abstract base class for\nLLM service providers\nwith image processing capabilities"
pradhyumna85 commented 2 months ago

@wizenheimer, @tylermaran instead of building our own classes for different providers, I would say it would be better to use LiteLLM (https://github.com/BerriAI/litellm) as it supports almost all popular providers using its single homogeneous api.

What do you think?

wizenheimer commented 2 months ago

That's an interesting take, sounds good 🚀

pradhyumna85 commented 2 months ago

@kobotschick I have raised a PR https://github.com/getomni-ai/zerox/pull/21 which is not merged yet but you can go ahead and test it, it works now. Install python package: pip install git+https://github.com/pradhyumna85/zerox.git@multi-provider-support-pysdk Follow this readme example: here