isayahc commented 7 months ago

59 extract images

Description

Please include a brief description of the changes introduced by this PR.

Related Issue(s)

solves #59

Changes Made

added animage_processing module inside of utils
create a function for extracting images from pdf files
removed repeat code in the test files
added caption_image function in innovation_pathfinder_ai/utils/image_processing/image_processing.py

Task in PRs

[x] Extract Images from pdfs
[x] Use A Multimodal LLM to extract Information from Images
[ ] Take the extracted Information and add it to the metadata of the vector database

Checklist

[x] I have tested my changes locally and ensured that they work as expected.
[ ] I have updated the documentation (if applicable).
[x] My code follows the project's coding conventions and style guidelines.
[x] I have added appropriate test cases (if applicable).
[x] I have reviewed my own code to ensure its quality.

Additional Notes

Need to find url to pdf without images to use as a test
I am currently still in the process of finding a good model, so it's currently not too high quality
The Vector Store is getting more complexed, so it should be moved to a class

Reviewer(s)

@almutareb
@vonderwoman

isayahc commented 7 months ago

This PR is not finished. I Need to make it a draft again.

isayahc commented 7 months ago

Still need to integrate the code to the existing codebase. @almutareb do i make a directory for the images, and have those image file locations as metadata. What other information should i put. Like in terms of the summary.

almutareb commented 7 months ago

@isayahc yes, for now let's just use a folder, we can use an object store, e.g. S3 over minio, later. For the metadata we need to be able to construct a context for the generation, let's try the following: summary generated from the text referencing the image, summary and number of page it came from, section header, description. This will help provide context and align it with the generation.

isayahc commented 7 months ago

@almutareb @vonderwoman i have notice there are instances where unrelated images are extracted (ie. logos, random desgin assests). We should use an image model to determine if the image is related to the text, and to help generate the image summary.

almutareb / InnovationPathfinderAI