dragotin / kraft

Kraft helps to handle your daily quotes and invoices in your small business.
http://volle-kraft-voraus.de
GNU General Public License v2.0
56 stars 18 forks source link

Feature Request: PDF metadata, User-definable patterns for generated PDF files (and maybe archived XML files) #146

Open noseshimself opened 2 years ago

noseshimself commented 2 years ago

Many companies have internal document naming schemes to identify files even if the sorting mechanism or filing structure break down. At the time of document generation many of the parameters are available while accessing them later is getting difficult and may require programming using the Kraft database.

{local example: "yyyymmdd document-ID corrspondent uuid optional-remark", e. g. "20220403 RE-22-1234 Maier AG eccbf114-a229-474c-bfd6-34acaa7963bc.pdf"}

As the transformation rules will rarely be modified they might even be specified in .config/kraftrc

Another nice-to-have feature would be adding as much of the interesting metadata to the PDF header so current DMS/AMS can extret them for rule-based storing of documents. If the built-in mechanism (Grantlee and Weasyprint) do not offer an easy interface for that, pdftk (preferred) or exif-tool could be used as part of the PDF generation chain.

dragotin commented 2 years ago

weasyprint transfers the metadata set in the header of the input html to PDF metadata. Commit 89685c09d50c70088ec8a6f2068699f9a7b38524 will use that in a more universal way.

Which other metadata would you be interested to see in the PDF?

noseshimself commented 2 years ago

Which other metadata would you be interested to see in the PDF?

Most important would be the UUID of the document this was printed from (as soon as objects get a unique UUID). This can be used in DMS(-like) aplications pointing to all metadata availlable for a document.

To avoid extracting things from the document's text content it would be great to have the document's "validity date" (which is not necessary identical to its creation date...), creation date, type and correspondent (== to whom it was addressed). If there is a customer's reference (great; I wanted to explicitly ask for this to be added) like their PO number or the like it should be in the metadata, too.

In my own case documents may have a subject (e. g. a one line describing a project ) but that's rather unusual.

Think DMS: What would you need to correctly categorize, tag and store documents in a DMS? Most grown-up document management systems use (virtual) hierarchical folders to put documents into (most of the systems don't restrict themto one place in the tree) and tagsfor grouping documents that would have to show up in many places (e. g. marking all tax-related documents with a unique tag across the entire tree). You either have to do that by hand or by scraping text from PDF documents (and EXIF data if there is any) -- or from "ritualized" metadata in documents.

dragotin commented 2 years ago

Do you know of any (naming-) standard for this kind of data in the PDF? PDF can contain arbitrary metadata as far as I can tell, but maybe there is a best-practise?

noseshimself commented 2 years ago

I haven't found any; that's why most DMS have such flexible metadata extraction and categorization rule engines... Someone really forgot something importantthere.