e-alfred / nextcloud-scanner

Scanner app for Nextcloud using the SANE framework
GNU Affero General Public License v3.0
23 stars 10 forks source link

Schedule (large resolution) scans as a BackgroundJob? #7

Open Biont opened 5 years ago

Biont commented 5 years ago

I noticed that I am unable to perform 600dpi scans because they'd result in a 504 Gateway Timeout I tried to set_time_limit(0) to no avail.

There must be some way to force our code to run until the end. Maybe enqueuing a QueuedJob will do the trick for installations that have a proper system cron set up.

I need to do more research on this.

Biont commented 5 years ago

I've been thinking for some time about this. The problem with a background job is that we have no idea when it will get processed and when it does, we hardly have any way to monitor the progress from the frontend. Also, it will only work reliably if there is a proper system cron set up. Ajax/Webcron will still fail on large scans

An alternative might be to create an occ command that more or less directly maps to scanimage and does the scanning for us. Then somehow exec(->fork) that from a REST request. This would ensure that the scan starts directly and allow us to properly add a file within NC after the scan has finished (which we could not do if we just forked our call to scanimage as it is now). And for what'ds it worth, we also get a cli command for our app in the process. At the same time, handing out the power of occ to the rest api sounds wrong, even if you take the greatest care to secure it.

A third option would be something similar the procedure described here: https://farazdagi.com/2014/rest-and-long-running-jobs/ I can imagine doing something like this: When a scan is started, we generate a tempfile and have scanimage write its progress into it (while the actual scan is written to a similarly named file that can be reconstructed knowing the name of the tempfile. rr9843rz4 and rr9843rz4_img, for example). The Scan endpoint immedately returns 202 and a reference to a new status/{id} endpoint, so in this case status/rr9843rz4.

Checking this endpoint will return the last line written by scanimage or a success code with a reference to the image file. This can now be fetched and written to a new file (either via some generic NC endpoint or a new one created by us).

This one almost has it all: No latencies introduced, ability to monitor the progress and the ability to tell when the scan has finished. The only obvious downside is that the frontend must be the one to actually finish the scan (->trigger new file creation somehow).

So neither of these is really perfect.

The only thing that would really solve ALL problems would be a combination of 2 and 3: Invoke our occ command in a way that its output is piped to a text file, which we then read to monitor progress. This would guarantee that the new file is created even if the frontend disconnects.

Thoughts?