SimulPiscator / AirSane

Publish SANE scanners to MacOS, Android, and Windows via Apple AirScan.
GNU General Public License v3.0
252 stars 26 forks source link

Robustness Improvement: AirSane service will hang after backend have an error, any further operations will cause an "session in use" error in the log, until manually restart the AirSane service using systemctl. An auto recovery or other proper ways of handling this session problem is better. #21

Closed linxucc closed 5 years ago

linxucc commented 5 years ago

Hi SimulPiscator,

Your software is awesome and it really does the job like expected. After several successful scans, I noticed in some scenario the AirSane service is easily hanged, so I make this report hope to help improve this software.

The problem is, when something goes wrong in the backend, the AirSane service will not handle them properly, and will hang and not respond to any further operations, until a manual restart of AirSane service through systemctl.

When the service is hanged, I can still use the "imagescan" command to scan image directly on the server, so it's not the backend/scanner problem. Any further oprations to the hanged AirSane service will cause an "session in use" error in the log.

So it's maybe the AirSane's session management part thing, and it may be improved.

Details of the problem and how to reproduce the hang:

  1. Do something that can cause the backend report some error. In my case, it's the "Auto Selection" feature with multiple selections, or manually select multiple selections, especially with overlapped selections or out of range selections.

    Screen Shot 2019-07-09 at 3 15 02 PM
  2. The first scan will succeed, and then it is hanged, any further attempts will not respond. Click "Overview" or "Cancel" then "Scan" will all cause the server side "Session in use" error log.

Now it just hang there forever:

Screen Shot 2019-07-09 at 3 15 31 PM

The "session in use" error:

Screen Shot 2019-07-09 at 3 43 02 PM
  1. The "scanimage" command on the server side can execute correctly, it's the AirSane service that hangs.

    Screen Shot 2019-07-09 at 3 44 55 PM
  2. Restart the AirSane service using systemctl, and everything is back on.

After my several tweaking, I think there are maybe two problems to be solved:

1. Some session management/check mechanism can be added, when it detects an error in the session, it can be auto-recovered, instead of a manual restart of the service from the server console.

So if something wrong with the backend in the session, this session should be destroyed or closed or reset or something and should be recovered by itself, other than just hang. And it's better to have an error report to the App side, so the App can know it's failed and can take its reaction.

This will solve most of the problem, because an ordinary user don't know how to log on to the server and restart it, as for them the wireless scanner is just broken and they just have nothing to do with it.

In my experience, such an error in the backend can be easily triggered even if you don't do anything strange, just open the Scanner App, everything default, click "Scan", with some specific document layout and it will just broken.

To solve every possible error is one thing, get recovered from it is another. This auto-recovery mechanism can hugely improve the robustness of this service. Without solving every possible problem, it can become a real work horse.

2. Add an operation/session queue, so multiple scan area selections and/or frequent user actions will not easily cause an backend error, and cause AirSane hang.

In my experiments, multiple small selections will randomly cause an error, frequent user actions too. For example, when the Scanner App displays "Scanner is warming up", you press "Cancel", and then press "Overview", or when the scan is in process, you "cancel" and then "overview", or just press "Overview" multiple times, it will easily hang and the console log will say "session in use".

So I think maybe it's the command is delivered while the scanner it self is not totally finished with the previous job, that causes the new job is discarded at the scanner side, and cause the session to be waiting forever, and block further actions. (That's my presumption, hope to provide some clue to this).

This may be another cause of the hang problem mentioned above.

So, If these issues are fixed, the robustness of AirSane can be improved greatly and become more error proof so ordinary users can rely on it without the mess of the console.

Hope this helps, thanks again!

SimulPiscator commented 5 years ago

Thank you for this detailed bug report. I'm not sure whether there will be a satisfactory solution to this problem. The "session in use" error message is from the SANE backend, which seems to be communicating with a HP scanner through some SOAP interface. Your testing suggests that the backend has an issue with multiple scan requests following short after each other. On the AirSane side, I found that SANE errors during a scan are not properly translated into a job status, and may even result in segfaults. I will further look into this issue, but it may take same time.

linxucc commented 5 years ago

Thanks for your reply. So it seems that it's a problem with the HP's SANE backend, if it's so I will not be surprised... The driver of HP's all-in-one machine is very horrible and buggy, it took me a very long time to get it just work, that's another story.....

As to my understanding, AirSane service works like a transparent proxy, it talks to each other side, deliver a command when the client APP push one, return an output when SANE return one, it's working in a passive manner. So it rely on the SANE backend to do the right thing, including status management, status report, job scheduling... And If SANE fails, AirSane fails... (This is my prediction, correct me if I'm wrong :) )

So I think if there are more SANE backend issues be reported by others in the future, maybe it's worth to implement an independent session/job management layer? It will separate the SANE failure with AirSane. This session/job management thing will supervise the status with SANE, if anything other than expected returned/happened or just nothing returned at all after some period of time, this supervisor thing will know something is wrong, then tells the user APP "Scan fails", kill/reset the current error session/job.

That depends on how it actually affects, with all the different manufacturer's different models, the behavior of all these SANE backends maybe not predictable. (note there's also a possibility that at last it turns out that the HP is the only one sucks...) So If there are more report like this one, or the SANE backend is proved not reliable or the segfaults issue is hard to fix, that may become a solution, but it also takes a lot of work and may complex the software and introduce new instabilities, so it just a talk for now.

As for now, I also have a simple dirty work around for this: Add an "reset" in the http url to execute "systemctl stop, systemctl start" or something more smart, just like a physical "reset" button. So when something going wrong with the scan, just open a browser and type "my_pi_ip:8090/reset". Seconds later my AirScan scanner is just up again...

SimulPiscator commented 5 years ago

I liked your idea with the reset button. The server has now a /reset url that does exactly what you suggested, sending a SIGHUP signal to the server internally. Let me know how it works for you.