Chia-Network / chia-blockchain

Chia blockchain python implementation (full node, farmer, harvester, timelord, and wallet)
Apache License 2.0
10.82k stars 2.03k forks source link

[Bug] farmer misses harvester I/O error #10062

Closed jayhohoho2019 closed 8 months ago

jayhohoho2019 commented 2 years ago

What happened?

A remote harvester was reporting ERROR for weeks that had prevented its harvester to read most of the plots, but the farmer node did not seem to have any clue. My monitoring was only on the farmer node which, among other things, checks the output of chia farmer summary periodically. If the reported plot count for all harvesters falls under a threshold it would alert. It never did during the last few weeks when this 1 harvester was almost entirely out of action. chia plotnft show was also showing the incorrect plot count.

Version

1.2.11

What platform are you using?

Linux

What ui mode are you using?

CLI

Relevant log output

On farmer (Ubuntu server 20.04.3) running on Intel.
`chia farmer summary` and `chia plotnft show` both reported the incorrect plot count. Did not pick up the fact that 1 harvester was unable to access most the plots connected to it.

Nothing in chia debug.log (set to WARNING).

On remote harvester (Ubuntu server 20.04.3) running on Pi4:

chia debug.log (set to WARNING)
chia.harvester.harvester: ERROR    Error plot file /mnt/farm/...plot may no longer exist [Errno 5] Input/output error:

syslog:
Feb  1 00:00:45 harvester01 kernel: [76696.429073] EXT4-fs warning (device sdb1): dx_probe:768: inode #2: lblock 0: comm chia_harvester: error -5 reading directory block

These errors started 01/14/22.
jayhohoho2019 commented 2 years ago

To clarify, farmer did not seem to be aware of the problem the harvester was having, as farmer was reporting the plots from that harvester also. On the harvester side, chia_harvester and daemon were both running. the plots were not accessible to harvester (ls /mnt/farm returned I/O error).

esaung commented 2 years ago

Did you resolve where the I/O error came from? is it from a disk going bad?

jayhohoho2019 commented 2 years ago

Could be cable, could be power supply, could be a disk or the 5-bay disk enclosure the HDDs are in. But after a reboot of the Pi4 harvester the error has not come back.

Regardless of the I/O error source, chia harvester did log the errors, but the farmer node wasn't picking it up.

github-actions[bot] commented 2 years ago

This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days.

jayhohoho2019 commented 2 years ago

The I/O error on the Pi4 harveser has not returned so far. However I suspect if/when it returns, farmer would still be unaware that the plots from that harvester had become unavailable.

github-actions[bot] commented 2 years ago

This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days.

emlowe commented 8 months ago

closing old issue