B-urb / doclytics

A document analyzer for paperless-ngx using ollama
MIT License
103 stars 5 forks source link

Logs issue #105

Open Zelnes opened 1 month ago

Zelnes commented 1 month ago

Hi there,

I started to try doclytics, but I think I'm not getting as many logs as I should.

Actually, the binary embedded in the bjoern5urban/doclytics:v1.3.0 does not match the version announced.

root@12d8e2cd59b2:/usr/doc# strings /usr/local/cargo/bin/doclytics | grep "Generate Response with LLM"
src/paperless.rsError at column Creating field: Retrieve Documents from paperless at: , with query: Error while fetching documents from paperless: Retrieve next page Fetching custom fields from paperless at Fields: Error occured parsing custom fields: : Error retrieving custom fields: Generate Response with LLM modelError parsing llm response json 'tagged' field not found in the provided fields.Error:  creating custom field: , skipping...payload is empty, not updating fieldsUpdating document with ID: Document with ID:  successfully updatedError while updating document fields: Error parsing response from new field: Error creating custom field: Error while interacting with paperless: Error generating llm response: Application started, version: 1.1.4-rc.9
root@12d8e2cd59b2:/usr/doc# strings /usr/local/cargo/bin/doclytics | grep "with Prompt"
root@12d8e2cd59b2:/usr/doc# 

Maybe there's an error in the construction of the container.

Proposition

  1. Could it be possible to customize the custom_field used to retrieve/update documents ?
  2. What about a variable BASE_PROMPT_FILE, which would contain the prompt ? This way, we can edit a flat file instead of a dockerized variable, and we wont need to restart the container to update the prompt.

Misunderstanding

I don't understand how I can use doclytics to be called automatically when a file is uploaded, or when the custom field is edited ? For what I can see, the tool will run once, with all documents that should be processed, and then dies.

Edit

And I forgot to mention that the tag bjoern5urban/doclytics:latest does not exist on hub.docker.com

B-urb commented 1 month ago

Hi there, the version right now is just statically logged, but I will fix that soon. Just hasn't been a priority for now.

Regarding your propositions:

  1. You can already change the filter query used for retrieval of the documents. I will make the fields used for tracking already scanned documents customizable soon. Probably changing the whole configuration to a config file, that can be mapped and used for customizing each query as well. This should take care of proposition two as well.

The automatic calling is another planned feature. I will soon provide a rest enpoint that can update specific documents. Then one can use the post consumption script feature of paperless to execute that webhook on each new document.

I intentionally did not create a latest tag. The latest version is always at tag :development or the latest release candidate. This might change in the future however.