al-caughey / YAMon-v4

Official repository for YAMon v4
https://usage-monitoring.com
63 stars 21 forks source link

YAMon4 stops working after a while #21

Open richb-hanover opened 4 years ago

richb-hanover commented 4 years ago

I apologize for this rather fuzzy/imprecise bug report but, in my experience, YAMon 4.0.7 simply stops working after a while (after a period of a couple months.) Furthermore, after a (possibly longer) period of time, the external USB drive stops working. Here's what I have seen...

This has happened with multiple (3-4) fresh installs of YAMon4 on at least two different OpenWrt routers running 18.06 or 19.07.

Any thoughts on debugging this? Many thanks.

alexandreloss commented 4 years ago

Hi Rich, I confirm this issue too, running YAMON4 on a router with DD-WRT. The alternative workaround I found was checking hourly if YAMON is running and, if not, start the process.

richb-hanover commented 4 years ago

But have you ever observed the /opt partition disappear? Or the entire USB drive become non-functional/not seen?

alexandreloss commented 4 years ago

No, I haven't. /opt stays always up. In fact, my opt drive is mounted in an external SSD connected to a USB 3.0 port. My router is a Linkssys WRT3200 running DD-WRT (updated weekly to the latest release available).

richb-hanover commented 4 years ago

Curiouser and curiouser... My original post notes that /opt disappeared from the computer, and the USB drive wasn't visible in ls, and that certainly would cause YAMon to stop.

Here's an update: I unplugged the USB drive from my OpenWrt router. When I inserted it into a Linux machine, it showed up and the files were as expected. install.sh and the YAMon4 directory were present at the top level, the lower-level files seemed to match what I expected. I didn't do a ton of troubleshooting. However, the latest /opt/YAMon4/data/lastseen.js file was dated 11 July... (I don't know if that's a hint or a red herring...)

I then plugged the USB back into the router (no change to any of the files), and the USB drive was immediately recognized. ls -al /dev/sd* and ls /opt showed the expected results.

A combination of service yamon4 enable and service yamon4 start and creating the symlink again, and maybe some other farbling around (I think I had to reboot to re-create /tmp/www) caused YAMon4 to begin running again. It has been fine for the last hour or so...

My questions:

  1. What might have caused YAMon to stop?
  2. What might have caused /opt to go away?
  3. Might they be related?

Two potential sources of trouble... I wrote both pieces of documentation that I used to get YAMon running. I'm not entirely certain that those procedures are correct, so it would be great to have someone else review them.

matlag commented 4 years ago

Is it possible that your hard-drive draws more current that the router can provide, leading to a voltage drop and then a self power off? Maybe there's a trace in OpenWRT's logs of the exact time it turned off, so it would tell you if you have a single event triggering both events or if they are separated in time.

richb-hanover commented 4 years ago

That's an interesting thought. It's a USB 1GByte memory stick, so I wouldn't expect its power draw to be significant. BUT... I know more now than before:

  1. It stopped working again on Tuesday after a power failure
  2. I ssh'd in after the power failure, ls -al /dev/sd* did not show any entries
  3. Unplugging, waiting 5 seconds and re-plugging the USB stick didn't change - ls ... was empty
  4. Rebooting the router didn't change anything - ls ... remained empty
  5. Plugging the USB stick into a separate Unix box showed the expected directory structure which seemed OK. Files in YAMon4/data were last modified this Tuesday (power failure day)
  6. After re-inserting it into the OpenWrt router, ls /opt/YAMon4 showed files as expected
  7. (I also had trouble restarting YAMon, but will post that in a separate issue...)

So, am I "doing something wrong" with this USB Drive? (I can't imagine that losing power makes all USB sticks fail for others...) I realize that this is a YAMon group, not a "help Rich debug his USB drive..." group - any thoughts about where to post these questions? Thanks.

Update: I posted this question to the OpenWrt Forum - that's probably a better place to answer it... Thanks again.