almutareb / InnovationPathfinderAI

GenAI Research Assistant for Innovation Labs
1 stars 2 forks source link

59 extract images #71

Closed isayahc closed 6 months ago

isayahc commented 7 months ago

59 extract images

Description

Please include a brief description of the changes introduced by this PR.

Related Issue(s)

Changes Made

Task in PRs

Checklist

Additional Notes

Reviewer(s)

isayahc commented 7 months ago

This PR is not finished. I Need to make it a draft again.

isayahc commented 7 months ago

Still need to integrate the code to the existing codebase. @almutareb do i make a directory for the images, and have those image file locations as metadata. What other information should i put. Like in terms of the summary.

almutareb commented 7 months ago

@isayahc yes, for now let's just use a folder, we can use an object store, e.g. S3 over minio, later. For the metadata we need to be able to construct a context for the generation, let's try the following: summary generated from the text referencing the image, summary and number of page it came from, section header, description. This will help provide context and align it with the generation.

isayahc commented 7 months ago

@almutareb @vonderwoman i have notice there are instances where unrelated images are extracted (ie. logos, random desgin assests). We should use an image model to determine if the image is related to the text, and to help generate the image summary.