Clemens-E / paperless-ngx-tools

Some of my tools for paperless-ngx, for example title generation
4 stars 0 forks source link

Parsing Title #1

Open Tintin0 opened 1 month ago

Tintin0 commented 1 month ago

Hi Clemens,

thank you for you great work. I'm only struggeling to fix the following issue. After I initiate "yarn create-titles", I receive the following error message:

"Fetching documents... Fetched 1 documents Failed to parse title This appears to be a multi-party agreement, likely for the formation of one or more companies".

After that "Failed to parse title", the script writes out a lengthly summary of the (in this case) contract, I try to make a title of.

Clemens-E commented 1 month ago

Hi, what model are you using?

Tintin0 commented 1 month ago

I've tested both Phi:3 and Mistral.

Clemens-E commented 1 month ago

So,

the script writes out a lengthly summary of the (in this case) contract

this indicates that the Model tried to summarize the document, instead of simply creating a title. I'm going to assume it didn't understand the task it had to do.

Which is weird because I used mistral and for 90% of the time it did fine with that task.

I recently switched to qwen2:1.5b, it did even better than mistral.

Can you adjust this prompt? https://github.com/Clemens-E/paperless-ngx-tools/blob/main/src/create-titles.ts#L25 and change this is where the title goes with a real example of how you want your titles to look.

If this doesn't help, it would be nice to know how qwen2:1.5b works for you.

I also pushed a little update to make parsing the title more robust.

Tintin0 commented 1 month ago

Hi Clemens, thanks. I fetched your updated version and this already solved more or less the problem. Now, I've got an error message that the title is too long. I then adjusted (i) the prompt a little bit (maximum titles length of 20 characters) and (ii) increased the 20 characters threshold to 60 characters. The updated prompt made the title shorter but still not within the limit unfortunately. The threshold increase solved it entirely.

I will also give qwen I try. I've already downloaded it but have not tested it out yet.

Best, Lorenz