R0Wi-DEV / workflow_ocr

This is a Nextcloud Workflow App which enables you to process files via OCR on serverside.
GNU Affero General Public License v3.0
79 stars 7 forks source link

OCR fails on file drop (shared upload folder) #273

Open XueSheng-GIT opened 2 days ago

XueSheng-GIT commented 2 days ago

Describe the bug

If ocr is done on file creation and someone uploads a file to a file drop folder (create permissions only), ocr fails with error because of missing rights for file modification.

System

How to reproduce

Steps to reproduce the behavior:

  1. Create workflow for ocr, trigger condition on file creation and MIME type equal to pdf.
  2. User A shares a folder with user B, custom permissions (create only)
  3. User B uploads a pdf file to shared folder
  4. OCR is triggered

Expected behavior: OCR should finish task without error.

Actual behavior: OCR fails with error because of missing rights for editing the file (which is expected for a file drop folder).

Screenshots

Workflow trigger condition: Bildschirmfoto vom 2024-10-11 15-42-39

Server log

Please paste relevant content of your nextcloud.log file here. It might make sense to first decrease the Loglevel. Also, since the OCR process runs asynchronously, run your cron.php before copying the logs here.

{"reqId":"7uws9FKnHpXJ0CIElFt1","level":3,"time":"2024-10-11T15:33:41+02:00","remoteAddr":"192.168.5.204","user":"admin","app":"no app in context","method":"PUT","url":"/remote.php/dav/files/admin/Upload/2024-06-21%20Materialliste%202024+25_TSF.pdf","message":"Exception thrown: OCA\\DAV\\Connector\\Sabre\\Exception\\Forbidden","userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:131.0) Gecko/20100101 Firefox/131.0","version":"29.0.7.1","exception":{"Exception":"OCA\\DAV\\Connector\\Sabre\\Exception\\Forbidden","Message":"You cannot update the version's metadata because you do not have update permissions on the source file.","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/files_versions/lib/Versions/VersionManager.php","line":160,"function":"setMetadataValue","class":"OCA\\Files_Versions\\Versions\\LegacyVersionsBackend","type":"->"},{"file":"/var/www/nextcloud/apps/files_versions/lib/Listener/VersionAuthorListener.php","line":65,"function":"setMetadataValue","class":"OCA\\Files_Versions\\Versions\\VersionManager","type":"->"},{"file":"/var/www/nextcloud/apps/files_versions/lib/Listener/VersionAuthorListener.php","line":48,"function":"post_write_hook","class":"OCA\\Files_Versions\\Listener\\VersionAuthorListener","type":"->"},{"file":"/var/www/nextcloud/lib/private/EventDispatcher/ServiceEventListener.php","line":86,"function":"handle","class":"OCA\\Files_Versions\\Listener\\VersionAuthorListener","type":"->"},{"file":"/var/www/nextcloud/3rdparty/symfony/event-dispatcher/EventDispatcher.php","line":230,"function":"__invoke","class":"OC\\EventDispatcher\\ServiceEventListener","type":"->"},{"file":"/var/www/nextcloud/3rdparty/symfony/event-dispatcher/EventDispatcher.php","line":59,"function":"callListeners","class":"Symfony\\Component\\EventDispatcher\\EventDispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/EventDispatcher/EventDispatcher.php","line":86,"function":"dispatch","class":"Symfony\\Component\\EventDispatcher\\EventDispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/EventDispatcher/EventDispatcher.php","line":98,"function":"dispatch","class":"OC\\EventDispatcher\\EventDispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/Files/Node/HookConnector.php","line":102,"function":"dispatchTyped","class":"OC\\EventDispatcher\\EventDispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/legacy/OC_Hook.php","line":105,"function":"postWrite","class":"OC\\Files\\Node\\HookConnector","type":"->"},{"file":"/var/www/nextcloud/apps/dav/lib/Connector/Sabre/File.php","line":480,"function":"emit","class":"OC_Hook","type":"::"},{"file":"/var/www/nextcloud/apps/dav/lib/Connector/Sabre/File.php","line":404,"function":"emitPostHooks","class":"OCA\\DAV\\Connector\\Sabre\\File","type":"->"},{"file":"/var/www/nextcloud/apps/dav/lib/Connector/Sabre/Directory.php","line":148,"function":"put","class":"OCA\\DAV\\Connector\\Sabre\\File","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/Server.php","line":1098,"function":"createFile","class":"OCA\\DAV\\Connector\\Sabre\\Directory","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/CorePlugin.php","line":504,"function":"createFile","class":"Sabre\\DAV\\Server","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/3rdparty/sabre/event/lib/WildcardEmitterTrait.php","line":89,"function":"httpPut","class":"Sabre\\DAV\\CorePlugin","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/Server.php","line":472,"function":"emit","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/Server.php","line":253,"function":"invokeMethod","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/3rdparty/sabre/dav/lib/DAV/Server.php","line":321,"function":"start","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/apps/dav/lib/Server.php","line":383,"function":"exec","class":"Sabre\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/apps/dav/appinfo/v2/remote.php","line":35,"function":"exec","class":"OCA\\DAV\\Server","type":"->"},{"file":"/var/www/nextcloud/remote.php","line":172,"args":["/var/www/nextcloud/apps/dav/appinfo/v2/remote.php"],"function":"require_once"}],"File":"/var/www/nextcloud/apps/files_versions/lib/Versions/LegacyVersionsBackend.php","Line":302,"CustomMessage":"Exception thrown: OCA\\DAV\\Connector\\Sabre\\Exception\\Forbidden"},"id":"6709296b4fd50"}

Browser log

If you're observing Browser errors, please paste your developer tools logs here.

Help for Chrome: https://developer.chrome.com/docs/devtools/console/#view Help for Firefox: https://firefox-source-docs.mozilla.org/devtools-user/browser_console/index.html

n/a

Additional context

Especially for shared folders, there's no guarantee that the user has the permissions to modify files. Thus, the owner should be used as backup to modify the file.

R0Wi commented 1 day ago

Thanks for this comprehensive report. I remember that in #110 we changed the behaviour of the app so that determines the user for running the OCR process by checking the file path. So when sharing a file internally in NC for a certain user, AFAIK the system will mount the file under the user, which is the receiver of the share.

In your case user B should see the file unter /userB/files/.... You should be able to verify this by turning the server loglevel down to DEBUG (0). You should then see something in the logs like "Adding file to jobqueue: ..." (https://github.com/R0Wi-DEV/workflow_ocr/blob/1f98f2be4d1345ca8987223aa38ceb0b0c6e79b9/lib/Operation.php#L114).

If that's the case I'm not sure if we can do anything here. Because like you said, userB doesn't have permissions to manipulate the existing file. Using permissions of userA to do the OCR processing in this case is also questionable because it would impersonate userB to userA for the process, which is also not clean. What do you think?

XueSheng-GIT commented 1 day ago

@R0Wi Thanks for clarifying the current approach. Please find below the requested output which confirms your assumption (User A is admin (sharer) and User B is Max.Mustermann (receiver of share)).

{"reqId":"kBtCi2eXC9ek6LkMTR0o","level":0,"time":"2024-10-13T10:45:07+02:00","remoteAddr":"192.168.5.204","user":"Max.Mustermann","app":"workflow_ocr","method":"PUT","url":"/remote.php/dav/files/Max.Mustermann/Upload/Scan.pdf","message":"Adding file to jobqueue: {\"uid\":\"Max.Mustermann\",\"fileId\":2938284,\"settings\":\"{\\\"languages\\\":[\\\"deu\\\",\\\"eng\\\"],\\\"tagsToAddAfterOcr\\\":[8],\\\"tagsToRemoveAfterOcr\\\":[7]}\"}","userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:131.0) Gecko/20100101 Firefox/131.0","version":"30.0.1.1","data":{"app":"workflow_ocr"},"id":"670b8933b7d4f"}

Access rights to file:

occ info:file 2938284
<! Unimportant info deleted !>

The following users have access to the file

admin:
  /admin/files/Upload/Scan.pdf: full permissions
    home storage
Max.Mustermann:
  /Max.Mustermann/files/Upload/Scan.pdf: read
    shared by admin owned by admin

Generally, I like the idea that the workflow runs in the name of the acting user (nice for activity and version history of the file). But at the same time, you set up a workflow to ocr files and probably the goal is that this workflow completes its task. Additionally, the owner of the file will never receive any information that the workflow failed for uploaded files. If I don't want that the workflow shall run on a specific folder, I could consider this for the workflow conditions (like additional check for tags).

Approach to overcome this issue: All relevant info (users, access) could be fetched based on the fileId. For example: https://github.com/nextcloud/server/blob/12c516775ef899d775ff6c1a9a7627dd9f112ad6/core/Command/Info/File.php#L102-L113

Of course this could even be more complex to add another per workflow setting to control this behavior (checkbox whether to allow this fallback). But as written above, there are other possibilities to control in which cases the workflow shall run.