getomni-ai / zerox

PDF to Markdown with vision models
https://getomni.ai/ocr-demo
MIT License
6.58k stars 358 forks source link

How can i make it run? #79

Open Mamlesh18 opened 3 weeks ago

Mamlesh18 commented 3 weeks ago

I am using microsoft azure openai to implement it.

I am running the code with changing the azure keys, but i am getting a error

Thanks in advance to help me out.

code: from pyzerox import zerox import os import json import asyncio custom_system_prompt = None

###################### Example for Azure OpenAI ###################### model = "gpt-35-turbo" ## "azure/" -> format / os.environ["AZURE_API_KEY"] = "" # "your-azure-api-key" os.environ["AZURE_API_BASE"] = "" # "https://example-endpoint.openai.azure.com" os.environ["AZURE_API_VERSION"] = "" # "2023-05-15"

Placeholder for additional model kwargs (none needed here for OpenAI)

kwargs = {}

async def main(): file_path = "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf" # Local filepath and file URL supported

# Process all pages (or specify select_pages as a list of page numbers, e.g., select_pages = [1, 2])
select_pages = None  

output_dir = "./output_test"  # Directory to save the consolidated markdown file
result = await zerox(
    file_path=file_path, 
    model=model, 
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages, 
    **kwargs
)
return result

Run the main function

result = asyncio.run(main())

Print markdown result

print(result)

ERROR: raise MissingEnvironmentVariables(extra_info=env_config) pyzerox.errors.exceptions.MissingEnvironmentVariables: Required environment variable (keys) from the model are Missing. Please set the required environment variables for the model provider. Refer: https://docs.litellm.ai/docs/providers (Extra Info: {'keys_in_environment': False, 'missing_keys': []})

pradhyumna85 commented 2 weeks ago

@Mamlesh18, the model variable should be your model deployment name in azure openai with a prefix “azure/“. Eg for deployment name “gpt-4o-mini” the model variable should be “azure/gpt-4o-mini”. Note only gpt 4o and gpt 4o mini models are supported in azure OpenAI.

Also have the 3 environment variables correctly set as per your azure OpenAI configuration: model = "azure/gpt-4o-mini" ## "azure/" -> format / os.environ["AZURE_API_KEY"] = "" # "your-azure-api-key" os.environ["AZURE_API_BASE"] = "" # "https://example-endpoint.openai.azure.com" os.environ["AZURE_API_VERSION"] = "" # "2023-05-15”

This assumes you have static api keys instead of service principle access. For service principle access you’ll need to use fresh bearer token instead of api key in the api key environment variable above.