long run stability issues

apollo-ng / picoReflow

Turns a Raspberry Pi into a universal, web enabled Reflow Oven Controller

202 stars 118 forks source link

long run stability issues #17

Open MooreBroCo opened 8 years ago

MooreBroCo commented 8 years ago

I'm using picoReflow to run a shop oven cooking composite parts. The profile we're running is simple but long; we ramp at 3c/min up to 121c and hold for an hour. Everything works great, except half the time picoReflow declares the run complete ~1/2 way through. I'm seeing a fair bit of noise when monitoring my max31855s outside of picoReflow, waiting on some 0.1uF caps that will hopefully get that under control. My gut tells me that my problems have something to do with the '855 throwing a error message every now and then. I'm going to switch to the error handling fork and see if that helps me run my long profiles with less worry. Anything else I should be thinking about?

chron0 commented 8 years ago

Great, I'm looking forward to the results, please let us know what you will have learned.

You may want to add some more logging in lib/oven.py to have a better trace of the situation and system parameters at the moment it breaks so that we can isolate the problem more easily.

It's easy to see how the web-client may freak out when confronted with a long-time dataset without any kind of data aggregation in between but the server should never crap out and stop a run.

It wouldn't even surprise me, if those are noise related issues, since the length of a run is related to the chance of some crazy show-stopping noise/EMI/error event. That could be cured by something simple like an extra capacitor and/or trying to merge the EH branch to master (I don't even remember why it's been kept separate :))

MooreBroCo commented 8 years ago

So time for an update; sorry for being silent for so long. The error handling definitely helps with the stability for longer run times. I'll add some logging info as well to see what sort of errors I'm getting when I do.

readout.txt

Basically copied your error handling into my tkinter readout and it's helped a ton. I primarily see thermocouple shorts, although there is a whole slew of different errors that pop up... still seeing new ones every time I bother to check. I'm thinking I may have gone a little small on my caps; I still get infrequent spikes to +/- 2000 from all four 855s. Just finished a four hour bake without any trouble, really getting our profiles dialed in at this point.

Awesome to have a new tool.

chron0 commented 8 years ago

That's very good news. I've had a similar problem with a data scraper; The source would sometimes just return broken and totally unreasonable values. I just cached the last result and compared it to the new one and if the new value was impossible to reach by the system in any condition in this period of time it would log the error and return the last value again.

When https://github.com/apollo-ng/governess/ development is moving toward the server part and in-system HW testing, I'll make sure to fork out some time during the max31855 input driver plugin development to reproduce/catch and prevent these issues at the source (if all other physical/electrical counter noise/spike measurements fail). Since your experience was better with EH I'll use this code as the basis for the new driver.