Ralim / IronOS

Open Source Soldering Iron firmware
https://ralim.github.io/IronOS/
GNU General Public License v3.0
7.28k stars 723 forks source link

Sleep vs. shutdown timeouts #94

Closed thanks4opensource closed 6 years ago

thanks4opensource commented 7 years ago

I hope I don't sound unappreciative, but IMO the current sleep vs. shutdown logic is wrong and can/should be improved.

It would be both more intuitive and more useful if the behavior was:

if no activity (motion or button) in SLTME
    go into sleep mode
if no activity (motion or button) in SHTME
    go into shutdown mode

Either by design or coding error, the current (1.17/1.17.1) code instead does:

if no activity (motion or button) in SLTME
    if no button press in SHTME
        go into shutdown mode
    else
        go into sleep mode

As an example of how this can be confusing and problematic, set SLTME=2 and SHTME=5 and do the following:

TIME    ACTIVITY
0       turn on iron, solder
1   adjust temperature (button)
2   put iron down
4   iron goes to sleep
5   pick iron up, wakes from sleep, solder
6   put iron down
8   iron goes into shutdown, not sleep
9   pick up iron expecting it to wake up from sleep, but
    instead have to push button twice (once to exit shutdown,
        2nd time to start) 

I think the iron should always go to sleep after SLTME inactivity, regardless of elapsed time since last button push. Likewise, always wake from sleep upon motion, unless SHTME has elapsed and it's already in shutdown/cooling mode. I contend that the above difference in behavior between times 4 vs. 8, and 5 vs. 9 is a problem in real-life usage.

This result is not dependent those exact settings -- they're just an easy demonstration. And of course there's also the known issue of MSENSE 8 or 9 never entering either sleep or shutdown. I'm constantly shaking the iron to keep it awake because MSENSE=7 isn't sensitive enough, and then when the solder doesn't melt looking at the OLED and discovering that the iron has gone into shutdown even though there's been (motion) activity within the SHTME period.

The following are the relevant code sections, from the Modes.c source. They are executing correctly, as written. A fix is fairly easy -- I'd submit a patch but am hesitant to do so as I don't currently have way to test it.

        case SOLDERING:
        ...
                        //We need to check the timer for movement in case we need to goto idle
                        if (systemSettings.sensitivity) {
                                uint32_t timeout = 0;
                                if (systemSettings.SleepTime < 6)
                                        timeout = 1000 * 10 * systemSettings.SleepTime;
                                else
                                        timeout = 60 * 1000 * (systemSettings.SleepTime - 5);

                                if (millis() - getLastMovement() > (timeout)) {
                                        if (millis() - getLastButtonPress() > (timeout)) {
                                                operatingMode = SLEEP;
                                                return;
                                        }
                                }
                        }
    ...
        case SLEEP:
        ...
                if (systemSettings.sensitivity) {
                        //Check if we should shutdown
                        if ((millis() - getLastMovement()
                                        > (systemSettings.ShutdownTime * 60000))
                                        || (millis() - getLastButtonPress()
                                                        > systemSettings.ShutdownTime * 60000)) {
                                operatingMode = COOLING; //shutdown the tip
                                Oled_DisplayOn();
                        }
                }

As always, thanks for being open to suggestions. Apologies if this is a duplicate -- I did a fairly extensive search to see if it had already been reported and didn't find anything. but I've been known to fail before at doing that. :(

Ralim commented 7 years ago

Hi, Firstly, this is not a duplicate afaik and thank you for noticing. edge cases in my logic like these are the biggest causes of the issues on this repository :/ I try to test...

You are correct in every single thing you have said, my logic is at fault here. However, at the moment I'm sort of treating the 1.x software as on the path to being deprecated by the 2.x branch (Which has entirely new logic for the entire iron written from scratch over the last few months in my spare time and fixes the sensitivity issue as well).

The last time a button was pressed or movement occurred is still tracked in the same way (ie timestamps).

In the 2.x branch, it is set up as :

In soldering mode :
IF ((currentTime - lastButtonPress)>SLEEPTIME && (currentTime - lastMovement)>SLEEPTIME)
    goto sleep
END IF

In sleep mode :
IF ((currentTime - lastButtonPress)>SHUTDOWNTIME && (currentTime - lastMovement)>SHUTDOWNTIME)
    goto shutdown
END IF

This does currently have the logic bug that if shutdown time is < sleep time the unit will wait sleep time and then shut down rather than going directly to shut down.

If movement detection is disabled (sensitivity set to 0) then last movement time stays at 0, and so the time check for movement will always be true and thus is effectively removed from the test.

I'm looking to start to put up 2.x beta firmware releases soon once I write the cold junction code for the tip temperature.

Would you like to see a patch pushed to 1.x in the meantime?

I would like to apologise for this bug though, it's very obvious once pointed out :/

thanks4opensource commented 7 years ago

Thanks, @Ralim. The pseudo-code for 2.x looks good to me. I can't imagine a valid use case for shutdown time less than sleep time, but you never know what someone will come up with. ;)

I'll try to beta test 2.x as soon as you put it up. But the combination of this plus the MSENSE problem is driving me crazy. I'm having enough trouble trying to teach myself fine-pitch SMT soldering as it is!

You have too much to do already, but maybe if you push out a 1.17.2 or 1.18-alpha you could do the quick hack of:

        case SLEEP:
        ...
                if (systemSettings.sensitivity) {
                        //Check if we should shutdown
                        if ((millis() - getLastMovement()
                                        > (systemSettings.ShutdownTime * 60000))
                                        /* || (millis() - getLastButtonPress()     TURNED OFF FOR ISSUE #94
                                                        > systemSettings.ShutdownTime * 60000)   */   )  {
                                operatingMode = COOLING; //shutdown the tip
                                Oled_DisplayOn();
                        }
                }

Though that might cause an edge case if MSENSE is 0 and SHTME is still desired via button press timeout (but SLEEP state is only entered via motion timeout anyway, so maybe not). I haven't thought this out thoroughly so don't listen to me -- I don't know what I'm talking about. ;)

Anyway, whatever you choose to do will be greatly appreciated. This thing is so close to being perfect that these smaller bugs (but with large usability implications) become even that much more annoying.

thanks4opensource commented 7 years ago

@Ralim: Just spent a few hours testing v2.0-alpha1, and I have just one word to describe it: Totally freakin' awesome! Oops, that's three words. ;)

Exactly what I've been dreaming of. I'm using SLTME 10S, SHTME 05, and MSENSE 8. Works perfectly. Never goes to sleep in my hand, yet sleeps almost immediately when placed in the holder. MSENSE 9 is a little too touchy -- I like to tin the tip after I set it down, and 9 does an unwanted wake up if I'm not very gentle doing so. (But please leave 9 in. Need to have "too far" to get to "just right".)

Some small comments, merely FYI: Help text for ADVDSP is same as for MSENSE. Still would be nice to be able to see the version string in the setup menu in addition to at startup. And are we ever going to get good old 'Muhrrican degrees Fahrenheit as an option? As much as I'd like to start thinking in rationally superior Celsius (hey! 30 degrees is freezing cold, ready to snow, not summertime warm-on-the-verge-of-being-too-hot) my hot air gun is degrees F only, and having to remember two sets of temperature-vs-condition tables is too much for my overloaded brain.

So, again, all good. Orders of magnitude improvement. Now all I have to do is train myself out of the nervous tic -- shaking the iron every few seconds to keep it awake -- that I developed using 1.17. :)

Ralim commented 7 years ago

@thanks4opensource Glad to hear its working well for you :)

Hehe yeah, 9 is way too sensitive for me to be able to use as well :) I'm definitely leaving it there for the one person who is magically super gentle.

Oh, thanks for spotting that, Fixed the description in the code already :) I can add Fahrenheit back in if you wish, I was just trying to avoid needing to support it 😬

thanks4opensource commented 7 years ago

@Ralim: Sorry if I'm representing the few of us stuck back in the Fahrenheit/inches/pounds stone age, but I'd use the option for the idiosyncratic reason I listed above. Frankly, I'd rather get the U.S. manufacturer of my mass-market consumer heat gun (poor man's hot-air rework station) to support Celsius, but the firmware's not field upgradable and they'd ignore my request even if it was.

I've only glanced at the new codebase, but if you're not design-adverse to global variables maybe a multiplier and an offset to do the conversion whenever/wherever the temperature's displayed? (And a divisor if the MCU doesn't have floating point.) Default to 1.0 and 0.0 for Celsius, and I can provide help with the Fahrenheit constants if you need it. ;) ;) ;)

You're supporting all those different language translations -- how about some native cultural support for those of us in a backward, developing second world nation? ;)

Ralim commented 7 years ago

@thanks4opensource It's no problem at all, I was just focusing on the larger audience first πŸ˜„

Actually, in this case, it should be fairly easy to support as the PID control loop is completely independent of the temperature unit (it works in terms of the raw measurement of the tip temp).

Hehe if you put it that way 😁 Ill add it to the stack.

Ralim commented 7 years ago

Pushed up alpha 2 of the 2.x branch, this adds Fahrenheit back in. In a nicer form than before, in that all temperatures are then performed in F rather than just soldering temp.

thanks4opensource commented 7 years ago

Thanks, @Ralim. You're the "Fastest Coding Man In Show Business"! Just did a small test with 2.0-alpha2 and it warms my xenophobic heart to be able to use "real"(tm) Fahrenheit degrees. ;)

A few small problems, some not related to the degree units, one easy to fix, some more difficult and debatable:

  1. Cool-down "WARN!xxx" LCD turn-off seems stuck at 30 degrees even in Fahrenheit mode, and will never be reached unless outside in the snow. Should be the equivalent 86F.

  2. The hard one: If there's going to be a units choice, all temperatures should really have "F" or "C" suffixes (preferably with the little circle degree glyph). Particularly the cool-down "WARN" one where it's a safety issue: Does "110" mean the tip's still a little hot but cool enough to change (F), or are you still going to be lightly burned (C)? Hopefully you're insulated (bad pun) by your location and the free software license, but I live in the land where lawyers rule all, and have to think about things like this. The issue is hard because LCD screen space is obviously at a premium, and a majority of users might resent wasting it when the default Celsius is obvious to them. But regardless whether the C-vs-F option is in there, somewhere/someday someone's going to be in a rush to change tips and get burned thinking that a unit-less "120" is safe.

  3. Unrelated (I think) to degrees, there seem to be inconsistencies in the shutdown/warn/turn-off logic. Might have something to do with CLBLNK mode, but I haven't pinned it down. In my opinion:

    a) Temperature should always be displayed while actively soldering, and also in sleep mode.

    b) Regardless of whether shutdown occurs via SHTME timeout or explicitly by double-button or long cord-end button press (BTW, thanks -- I'm one of the clumsy ones who flubs double-button 50% of the time) the "WARNxxx" display should be on until 30C/86F is reached. Then the LCD should blank. Any button press (but not motion) at either "WARNxxx" or blank should go into start-up mode/display.

    c) When power is applied (from off / no power) "WARNxxx" mode should be active (with button press abort as per above) if temp is above 30C/86F, otherwise directly to start-up.

Yes, that's a lot of (continuing) requests, but I hope they're taken in the spirit intended -- not as "I want" but more "it would be an improvement for everyone". As always, all of them are subject to discussion and your final decision.

In any case, it's great to be able to work in lifetime-ingrained degrees F. Is 30C too hot too touch? 40? 50? I'll never know. ;) Now if you can just get the toilets in Oz to flush the right way: https://youtu.be/BdDdeS997hM?t=33

Ralim commented 7 years ago

Hi, I have patched some of your points and pushed to the repository. However, I'm holding off on another alpha build for a little bit longer so I'll message back where when another test release goes live.

In regards to your points:

  1. Fixed, sorry about that one
  2. I have gone through and done this is most areas as far as I can find, let me know if there is anywhere I missed. This is mostly me upbringing coming across as any unitless temperature is taken to be Celsius as a given.
  3. Not sure what to fix for this one, The current logic is that display can only turn off when on the idle screen, otherwise it will be kept on. This means that when it is shutdown (however triggered) it will wait for it to cool down to 30C before returning to idle, which it will then usually turn off the screen. When power is first applied the system will look if the tip is > 50C and if so it will jump to the warning screen as well.

If you manage to find the case for the bug I'm more than happy to fix it πŸ˜„ Just not quite sure what the bug is right now.

thanks4opensource commented 7 years ago

Thanks, @Ralim. I'll grab the next alpha whenever you put it up. (I still haven't had the time to create a build environment to compile the sources myself.) If you've found space for "F"/"C" suffixes on all the temperature displays, that's awesome. I think it's a very good thing to have regardless one's units preference.

Like I said, I was doing a lot of testing and couldn't quite pin down what I thought was inconsistent about the shutdown/cool/LCD-off logic, and it might well have had something to do with the "wait for 30F to shutdown" bug. If I find something definitive, either before or in the next alpha, I'll post here.

thanks4opensource commented 7 years ago

OK, I think I've found something repeatable. Full settings list:

PWRSC DC
STMP 050
SLTME 10S
SHTME 01   (so don't have to wait to see bug)
MSENSE 8
TMPUNT F   (same result if "C")
ADVDSP F
DSPROT R
BOOST F
BTMP 420
ASTART F
CLBLNK F
TMP CAL?
RESET F?

With 2.0-alpha2, if the iron is explicitly put into shutdown via long cord-side button, the LCD will blink "WARN!xxx" until 30 (F or C, known bug, fixed) and then turn off. If instead it goes into shutdown via the 1 minute timeout, the LCD immediately turns off despite temperature still being approx. 200C, and requires a button press to get the start menu.

Soldering temperature was 550F in "TMPUNT F" and 300C in "TMPUNT C". Yes, I know they're not the same. ;)

Can you test with your current build and/or 2.0-alpha2 to see if you get the same behavior on your TS100?

Ralim commented 7 years ago

Hey @thanks4opensource Could you check if this is patched in the newer alphas, I'm fairly sure I've got this fixed, but would like your opinion 😁

thanks4opensource commented 7 years ago

Yes, has been working perfectly in 2.0-alpha.4. I figured "no news is good news" -- my apologies if you were waiting for an explicit response.

Many improvements in alpha.5, but sorry to have to report one problem in regards to this issue. Don't know if the long button press to see version info is new (I think I was the one to request it?) and it's perfect from the startup and menu screens. But I use long rear/LCD button press to explicitly shutdown instead of waiting for SHTME timeout, and now this is very "touchy". A short press increments the current soldering temperature and a long press goes to the version display screen and from there to startup -- the bad situation having a hot tip that's not being reported. I've gotten a "medium" press to go to shutdown, but it's hit-or-miss. Even the double-button press shutdown (which some of us find hard to use) seems to randomly either go to shutdown with cooldown temperature display, or directly to the startup screen.

Can you see if you can duplicate the problem? My suggestion would be that double, or long rear, button press always goes into shutdown with cooldown display, and the long press for version only be available at the startup screen.

The 2.0 alphas are asymptotically approaching perfection. Love the single button press to exit shutdown and resume soldering.

Ralim commented 7 years ago

@thanks4opensource Yeah no problem, I thought I would touch base anyway to check in on it :)

I've updated the a5 release to bugfix that oneπŸ˜‰. Hopefully, anyway, It fixes it for me locally here :) The bug was the exiting soldering iron code not 'consuming' the button press when it exited, so it would skip the warning screen. Hopefully, that ones patched for now :)

Let me know if that fixes it for you :)

This is the side effect of having 3 ways to enter and exit things, I can't test every permutation every time I build a release. This was a side effect of the implementation of the single button to re-enter soldering from warning screen.

thanks4opensource commented 7 years ago

Thanks for the quick bugfix! Downloaded, flashed, brief test. Looks good, works perfectly!

Yeah, functional and code coverage testing is hard. Unit tests? Been there, done that -- they take more time than the actual code itself. How about the same STM MCU that's in the TS100, on a breakout board, connected to another MCU that feeds it simulated inputs on the correct ports and watches the PWM, LCD, etc. output pins to see if they're behaving as expected. That little project ought to keep you busy for a while. ;) And it still doesn't test the MOSFETs and the rest of the analog side. It's never-ending.

I guess that's what we -- your loyal army of alpha/beta testers -- are here for. :)

Ralim commented 7 years ago

No problem, if I reproduce it often it can be a quick fix :)

Yeah that's sort of where I am, for the amount of time I could spend trying to write tests and emulate the behaviour of the screen I may as well have pushed it out and received 5 bug reports first. I'm amazed some time how fast people can find edge cases in the code.

Though I wouldn't complain if someone were to test the whole thing, it's way too much more for a project that is ultimately just a small tool in a toolbox of devices.

If you don't find any more bugs in a day or two I'll close the issue :)