Open ferdiga opened 1 month ago
Additional remark: I would go for "restore the original modification date after adding the OCR layer." because
Hi @ferdiga, thanks for the comprehensive explanation of your use-case. I think you already described that changing a file (so adding the OCR layer) automatically changes the last modified date, which is the expected behaviour when touching a file on a system and writing new content to it. The app itself just utilizes the NC API to create a new file version here. The used file_put_contents
just writes the file to the disk and creates a new file version in Nextcloud without the option to change any file metadata (see here.
A possible way to implement this after the new file version has been written would be to use touch
with a second argument (the old timestamp). In the UI we'd need to have an additional parameter like "Maintain original modification date". If set to true, we'd need to store the original modification date before creating the new file version, and write it back after it has been created.
Possible Workaround
For the time being one could "chain" the Workflow OCR with the Workflow Script:
Hi, thanks for looking into this, Option 2: once the file has the new tag, it has also the new timestamp. IMHO not the way to go.
What I probably will do
another script is necessary for digitaly signed files - print not copy to destroy the signature, because the original must be preserved (ocrmypdf will not touch it) , nevertheless we want to have a searchable version.
Describe the bug
we plan to load historical pdf files into the database and want to make them searchable using OCR workflow, which changes the modification date of the file - hence the important historical context of the modification date is "lost", limiting the usability of this great feature.
The ocrmypdf maintainer confirms, that ocrmypdf must change the modification date to comply to the standard.
For the OCR workflow I see 2 options:
I have created a little python script which prepends the original modification date to all pdf files if no date is found at the beginning of the file to overcome this situation, but want to clarify the situation before I proceed.
System
How to reproduce
Steps to reproduce the behavior: trigger the OCR Workflow