Open nekopep opened 1 month ago
Well, reviewing the pdf, perhaps my pdf was not the more suitable for the comparison, the vision LLM has a net advantage since it can analyze the data position to better understand the pdf.
Still, It make this feature request even more interesting ;)
What would you like to see?
When I work with anything LLM each time I upload a doc it is automatically embedded by anythingLLM into the workspace. From my experience I get low quality result working with direct embedding. Thanks to your last commit to support MistralAi vision, I experimented using the same pdf and instead I did this:
pdftoppm -png -r 100 airmontenegro.pdf > airmontenegro_100.png
optipng -fix airmontenegro_100.png
and then I directly upload the image to the model and work with it. I found it more accurate.
Would it be possible to work directly with pdf as image for vision LLM or add something in UI to allow this?
This is related to a more global issue I have with working with anythingLLM and pdf docs. I think (and probbaly I'm compeltely wrong) that anythingLLM is thought as a chat interface able to ingest a ton of documents and work on this mass of data. My use case is more basic, usually I want to work on only one pdf and in this case I found the workspace UI difficult to use to get the result I want. Only today I get the result I wanted with this "pdf to image" trick.
This is a more general "last feature" missing to anythingLLM when users come from chatGPT and are used to make it ingest PDF. AnythingLLM will gently absorb the pdf and add it to the workspace BUT add it to all other PDF currently in the workspace (generally not a thing basic users wants). Even a /reset keep the docs, so if you work on different pdf iteration the workspace is messed up with all the pdf uploaded (because the /reset do not reset the file uploaded).
My feature request can be used for user to experiment pdf to image for vision LLM, allow basic user to work like in chatGPT and perhaps we could discuss in another ticket how to fix the more general usabilty issue I get with working with one shot pdf?
Example with pdf embedding: Result:
Example with image conversion and direct upload: (Much) Better result:
Thank you for any feedback