Labsmore / pyuscope

Python machine vision platform
BSD 2-Clause "Simplified" License
80 stars 17 forks source link

Exception in planner thread with LIP-VM1 #400

Closed edgetriggered closed 8 months ago

edgetriggered commented 10 months ago

Large scans are dying on loaner laptop (labtop3) at 75% point, seemingly caused by CPU frequency throttling having an effect on meeting a task deadline with three second timeout. Smaller test jobs with a few images to stitch run to completion.

dmesg is filled with messages like the following:

[90759.513304] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 545405)
[90759.513305] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 651553)
[90759.513306] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 651551)
[90759.513307] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 545404)
[90759.513308] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 651551)
[90759.513309] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 651553)
[90759.517352] mce: CPU2: Core temperature/speed normal
[90759.517352] mce: CPU0: Core temperature/speed normal
[90759.517354] mce: CPU1: Package temperature/speed normal
[90759.517355] mce: CPU3: Package temperature/speed normal
[90759.517355] mce: CPU0: Package temperature/speed normal
[90759.517356] mce: CPU2: Package temperature/speed normal

pyuscope.log

JohnDMcMaster commented 10 months ago

Thanks for your report!

1) We can swap out your laptop. That might be the easiest fix here

2) We'll think over potential fixes here. Maybe a config directive that extends timeouts

JohnDMcMaster commented 10 months ago

I've added a new config directive you can add to ~/.pyuscope. Sample entry that increases timeouts from 3 seconds to 30 seconds. Can you pull latest code, add this directive, and let me know if this solves your issue?

NOTE: another memory corruption issue was also identified and fixed.

{
       "timeout_scalar": 10.0,
       "labsmore_stitch": {
...
}
JohnDMcMaster commented 10 months ago

Is this possibly related to the memory leak? That should now be fixed: https://github.com/Labsmore/pyuscope/issues/375