CORE-POS / IS4C

Cooperative Operational Retail Environment
http://www.core-pos.com
GNU General Public License v2.0
63 stars 44 forks source link

Lane session hangs when printer offline #321

Closed flathat closed 8 years ago

flathat commented 10 years ago

At the Techhub conference call today (@DavidHermann) the issue described but not resolved on the forum Printer error and finishing transaction came up again. Quoting from there:

Re: Printer error and finishing transaction Postby andy_wfc » Fri Jan 25, 2013 11:51 am There are two intertwined issues here. The relevant script is ajax-callbacks/ajax-end.php which prints the receipt, clears out ltt, and increments the transaction number. When the printer fails while PHP is writing to it (or trying to) these are the issues that arise:

  1. ajax-end.php hangs at that line This means nothing beyond that line executes. There may be some operating system differences in behavior but my experience is it will block there forever until Apache restarts or the printer becomes available. Any timeouts or execution time limits in PHP's configuration are ignored. For me, calling is_writable() on the device file also blocks indefinitely if I try to preemptively check whether the printer is available.
  2. The session might lock Because ajax-end.php has not completed, any subsequent actions or pages that use $CORE_LOCAL (i.e., everything) might hang. Apache keeps all the subsequent requests queued up until ajax-end.php completes and unlocks the session file. The pending requests still run once the printer clears so I expect that's why items eventually show up on screen. I've found the Apache configuration is relevant here. If I'm using mod_php then the session will lock up; if I have PHP configured as CGI it won't.

I've tried to move everything in ajax-end.php ahead of writing to the printer so ltt still clears and trans_no still increments but that doesn't address the session locking issue. Writing the receipt to a temporary file and then using fork() or register_shutdown_function() to print it in a separate process might avoid the session lock but I don't know how we'd relay a message back to the cashier to show something went wrong.

I wonder if something like this might make it possible to alert the cashier:

  1. A cronjob to uses PHP is_writable() or similar at a defined interval (1 min, 5 mins) and on success puts a timestamp in a file or touches it.
  2. ajax-end.php checks that file and if the timestamp is stale alerts the cashier before printing.
gohanman commented 10 years ago

The fundamental problem is that is_writable doesn't work. It'll just block forever. The cronjob will too.

A lot of this is resolved upstream. The session gets closed and flushed before trying to print which resolves the locking issue. Rotating the current transaction data into dtransactions & updating trans_no also happens before trying to print. But there's nothing for notifying the cashier.

flathat commented 10 years ago

I'm looking for a way for ajax-end.php to become aware of a problem without checking the printer port directly. So if the last time the cronjob found is_writable(/dev/lp0) == True and wrote the timestamp was more than a minute ago, which would be the case if it were hung, ajax-end would alert the cashier instead of trying to print. I assume the is_writable() process is the only one that blocks.

I assume the cronjob will unblock when the printer is OK again. The last time I dealt with this I think the session resumed that way. I probably haven't had the situation since the upstream changes, maybe it looks, works differently now.

gohanman commented 10 years ago

You're right, the cronjob should resume (or multiple cronjobs would resume since they'd pile up if no one noticed for more than a minute).

There's a legitimate issue here, I'm just not crazy about the solution. It's not a complete fix since there's still at least a one minute window where the error can occur, and it creates a new potential invisible problem. But I must admit I don't have a better idea.

flathat commented 10 years ago

Communicating with devices is not one of my areas of expertise but what I'm looking for is something that behaves like ping does for network devices: it reports failure but doesn't hang or block.

This is the only place I can think of in day-to-day CORE where "expert" knowledge is needed to diagnose a problem. I think it's worth some attention from the "needed level of local support" point of view.

Would it be possible for printers to be network devices so ping could test them? Some other hardware-level approach?

I'm going to try the cronjob technique.

DavidHermann commented 10 years ago

I know less than anyone about communicating with devices, but isn't there some way in the php code to trap the unfriendly error "Fatal error: Maximum execution time of 30 seconds exceeded in ...\ReceiptLib.php on line 54" restate it, and list some possible causes/fixes? The suggestions I'm seeing about cron jobs continually testing the device, etc., sound like they might be overkill. Please let me know what I'm misunderstanding -- thanks.

gohanman commented 10 years ago

No; you can't trap a fatal error.

If you do anything with the device file (including check if it's writable) and the printer isn't ready, the process gets stuck right there. You can't do anything afterward.

That's the basic problem. The workaround @flathat is suggesting would hopefully result in the cron process getting stuck instead of the POS process getting stuck.

Pinging isn't a full solution helpful since if the printer's present but not working (jammed, out of paper, etc) it'll still get stuck. If you could actually send it commands over the network, that would very likely remove the problem.

gohanman commented 10 years ago

Idea: 7fb8ace6782e32b3e4e4cd091859896464fb7ef3

Apparently you can hook a function / method to run after a fatal error using register_shutdown_function. It's not clear to me exactly what state the request is in at that point or whether the session state is even available, so I'm starting very unambitious and just trying to print this message:

{ "error" : "Printer is not responding" }

If that happens to show up on screen, the formatting's a little weird but at least it's human-readable. But if it's a background request initiated via javascript (often the case when printing), the callback function can actually parse that message and figure out that an error happened. In this commit I'm just changing the onscreen receipt icon.

gohanman commented 10 years ago

b069eb0989b4d2e271b74c070e685bddaf586139

I tried this today with some additional javascript in case the AJAX call was ending in an error condition or not originating from where I expected. In my test it didn't work at all. I turned off the printer, finished a transaction, and nothing happened. I haven't had time yet to investigate exactly where the trouble is. Possibilities:

At this point, I think I need a dedicated test script to purposely trigger a fatal printer error and see what the output is.

flathat commented 10 years ago

I've been trying to use udevadm, under Ubuntu 12.04.4, to report events with the parallel port, but so far with no success. (I use a udev script to set the permissions on /dev/lp0 at boot.) I thought

udevadm monitor --kernel --udev --property

might show something, but there is nothing when any of: paper out, error (cover open), turned off. This may have to do with there being very few udev attributes associated with /dev/lp0. It seems that the attributes can be tested by the udev rules so that "if x happens do y". There don't seem to be "x"s defined to act on. I don't understand this well enough to speculate very far. Perhaps the driver, if that's where the attributes come from, doesn't report information that is on the port. However it doesn't show anything even when the printer is connected to the port for the first time after booting without it. That is reported to syslog:

Mar 11 20:44:55 lane2 kernel: [   18.628591] lp0: using parport0 (interrupt-driven).
Mar 11 20:44:58 lane2 udev-configure-printer: add /devices/pnp0/00:0a/printer/lp0

Based on:

$ udevadm info -a -n /dev/lp0
...
  looking at device '/devices/pnp0/00:0a/printer/lp0':
    KERNEL=="lp0"
    SUBSYSTEM=="printer"
    DRIVER==""
...

I've looked at the files in:

$ ls -l /sys/devices/pnp0/00:0a/printer/lp0/power
total 0
-rw-r--r-- 1 root root 4096 Mar 11 21:23 async
-rw-r--r-- 1 root root 4096 Mar 11 21:23 autosuspend_delay_ms
-rw-r--r-- 1 root root 4096 Mar 11 21:23 control
-r--r--r-- 1 root root 4096 Mar 11 21:23 runtime_active_kids
-r--r--r-- 1 root root 4096 Mar 11 21:23 runtime_active_time
-r--r--r-- 1 root root 4096 Mar 11 21:23 runtime_enabled
-r--r--r-- 1 root root 4096 Mar 11 21:23 runtime_status
-r--r--r-- 1 root root 4096 Mar 11 21:23 runtime_suspended_time
-r--r--r-- 1 root root 4096 Mar 11 21:23 runtime_usage

which suggest they contain status or activity information, with hexdump. They are touched from time to time, but not in immediate response to the printer being out of paper or turned off, so I don't know what they mean.

The parallel port has pins dedicated to paper out and error (assuming it means the same thing as the light on the printer).

The "paper out" condition is noted in /var/log/syslog after an attempt to print to the printer when paper is already out:

Mar 11 20:55:03 lane2 kernel: [  628.065024] lp0 out of paper

There is no note there when paper is restored. There is no note for the cover-open/error condition and none when the printer is turned off. I'd like to know where the "paper out" message comes from.

What I was hoping for was a script like this one that uses the now-deprecated hald to detect a mouse being plugged in. For me, too many steps with uncertain outcome involved in trying to do that for this problem at this point.

gohanman commented 10 years ago

A more involved approach: 2abe1a68b3782198d95b159088caac1c9904662d

Push writing to the printer into the NewMagellan. Lane sends the receipt text to the driver over TCP and can set a reasonable timeout. If the driver doesn't send back an acknowledgement that it printed successfully, the lane can actually report an error.

gohanman commented 10 years ago

Minor revisions after testing: ad3f5ca6cce16ccd78b10dc43fdf18724fcb173e

It's working using a text file in ports.conf (e.g., /tmp/fakereceipt.txt). I expect it'll work in Linux using /dev/lp0 or /dev/usb/lp0 without much fuss. The Windows implementation is a bit more complicated so that'll need further testing. Lane-side configuration is setting printerPort to "tcp://localhost" instead of "LPT1" or "/dev/lp0".

I'm pretty sure this solution has a 100% solution of resolving the original problem. PHP never writes the printer, so PHP never hangs while trying to write the printer. So the real question will be whether or not handling the printer in the driver introduces new problems.

joelbrock commented 10 years ago

Holy wow this is epic @gohanman ! My testing printer is on loan to someone but i will be very eager to test this out under linux ASAP. I'll report back.

flathat commented 10 years ago

To try this on my test lane, do I make a branch from master called posDevelop and merge upstream posDevelop to it? Or does my posDevelop need to be a separate clone? I want to be able to merge this into master, ultimately to production if all goes well.

gohanman commented 10 years ago

You could do it that way, or you could make a branch from master and cherry-pick the two commits from this thread into it. That would avoid messing with any other changes in posDevelop.

In hindsight, this is probably where a branch for a specific feature would have been useful instead of having one bleeding edge "dev" branch with various changes. I have trouble actually using that workflow though.

gohanman commented 8 years ago

There's still an issue here but I'm not convinced adding a driver layer between PHP and the printer is a better approach than managing the session lock more carefully. Certainly in 2 years there has been any real movement towards the suggested approach.