getomni-ai / zerox

PDF to Markdown with vision models
https://getomni.ai/ocr-demo
MIT License
6.58k stars 358 forks source link

Upload image file #58

Open valinagacevschi opened 1 month ago

valinagacevschi commented 1 month ago

I'm wandering if there is a way to pass the uploaded_file or the stream of bytes instead of the file_path as URL? This question is for python version.

tylermaran commented 1 month ago

Hey @valinagacevschi. You should be able to pass in a file path from your backend as well. Would this work as well, or do you not have the file saved down anywhere?

async def main():
    file_path = "./path/to/image.png"  # Local path to image

    # Call the zerox function with the local file path
    result = await zerox(file_path=file_path, model=model)
    return result

result = asyncio.run(main())
pradhyumna85 commented 1 month ago

duplicate of #49, #67. @tylermaran, the python sdk only support PDF files at the moment. We'll need to add support for image inputs also.

iBala commented 3 days ago

+1. Image support would be appreciated. It works with the following changes.


            # It's a PDF file
            raw_file_name = os.path.splitext(os.path.basename(local_path))[0]
            file_name = "".join(c.lower() if c.isalnum() else "_" for c in raw_file_name)
            # Truncate file name to 255 characters to prevent ENAMETOOLONG errors
            file_name = file_name[:255]

            # create a subset pdf in temp dir with only the requested pages if select_pages is provided
            if select_pages is not None:
                subset_pdf_create_kwargs = {"original_pdf_path":local_path, "select_pages":select_pages, 
                                        "save_directory":temp_directory, "suffix":"_selected_pages"}
                local_path = await asyncio.to_thread(create_selected_pages_pdf, 
                                                     **subset_pdf_create_kwargs)

            # Convert the file to a series of images
            images = await convert_pdf_to_images(local_path=local_path, temp_dir=temp_directory)
        elif file_extension in [".png", ".jpg", ".jpeg", ".tif", ".tiff", ".bmp", ".gif"]:
            # It's an image file
            raw_file_name = os.path.splitext(os.path.basename(local_path))[0]
            file_name = "".join(c.lower() if c.isalnum() else "_" for c in raw_file_name)
            # Truncate file name to 255 characters to prevent ENAMETOOLONG errors
            file_name = file_name[:255]

            # Warn if select_pages is provided
            if select_pages is not None:
                warnings.warn("select_pages parameter is only applicable for PDF files and will be ignored for image files.")

            # Add the image to the images list
            images = [local_path]
        else:
            raise FileUnavailable("Unsupported file type. Only PDF and image files are supported.")```