Klipper3d / klipper

Klipper is a 3d-printer firmware
GNU General Public License v3.0
9.33k stars 5.28k forks source link

[Question] Debugging early boot process #3414

Closed vintagepc closed 3 years ago

vintagepc commented 3 years ago

Hello!

Let me prefix this by saying I don't have any expectation of support for this; I know full well it's way off the beaten path and into the jungle; That said; I'm hoping that someone with more experience interpreting the logs could save some time and point us at what might be going wrong. Now, on to the issue:

I had a request to support Klipper in MK404 (https://github.com/vintagepc/MK404/issues/263). It seemed fairly straightforward, given it can run stock Prusa firmware and Marlin - and Klipper also is capable of running in a simulated environment through simulavr. (MK404 is built on SimAVR instead).

Alas we are having some issues very early in the startup process and I was hoping someone might be able to shed some light on either the issue or the debugging mechanisms/outputs available (if any) beyond straight-up avr-gdb on the simulator, which would require an in-depth knowledge of its workings beforehand.

So far we've determined something appears to be going wrong very early on in the process - the display doesn't even initialize correctly and just shows garbage:

image

I monitored the simulated display and it almost looks like the 4 bit nibbles are swapped when compared to the display data sections in the Klipper log, suggesting a synchronization problem.

Display VRAM write log ``` ProcessWrite command 0write when still BUSY ProcessWrite command write when still BUSY OnCmdReadyactivating 4-bit mode hd44780_write_data 02 () to 48 hd44780_write_data 02 () to 47 hd44780_write_data 02 () to 46 hd44780_write_data 02 () to 45 hd44780_write_data 02 () to 44 hd44780_write_data 02 () to 43 hd44780_write_data 02 () to 42 hd44780_write_data 02 () to 41 hd44780_write_data 82 (�) to 40 hd44780_write_data 02 () to 27 hd44780_write_data 02 () to 26 hd44780_write_data 02 () to 25 hd44780_write_data 02 () to 24 hd44780_write_data 02 () to 23 hd44780_write_data 02 () to 22 hd44780_write_data 02 () to 21 hd44780_write_data 02 () to 20 hd44780_write_data 02 () to 1f hd44780_write_data 02 () to 1e hd44780_write_data 02 () to 1d hd44780_write_data 02 () to 1c hd44780_write_data 02 () to 1b hd44780_write_data 02 () to 1a hd44780_write_data 02 () to 19 hd44780_write_data 82 (�) to 18 hd44780_write_data 02 () to 17 hd44780_write_data 02 () to 16 hd44780_write_data 02 () to 15 hd44780_write_data 02 () to 14 hd44780_write_data 02 () to 13 hd44780_write_data 02 () to 12 hd44780_write_data 02 () to 11 hd44780_write_data 02 () to 10 hd44780_write_data 02 () to 0f hd44780_write_data 02 () to 0e hd44780_write_data 02 () to 0d hd44780_write_data 02 () to 0c hd44780_write_data 02 () to 0b hd44780_write_data 02 () to 0a hd44780_write_data 02 () to 09 hd44780_write_data 02 () to 08 hd44780_write_data 02 () to 07 hd44780_write_data 02 () to 06 hd44780_write_data 02 () to 05 hd44780_write_data 02 () to 04 hd44780_write_data 02 () to 03 hd44780_write_data 02 () to 02 hd44780_write_data 02 () to 01 hd44780_write_data 82 (�) to 00 hd44780_write_data 02 () to 67 hd44780_write_data 02 () to 66 hd44780_write_data 02 () to 65 hd44780_write_data 02 () to 64 hd44780_write_data 02 () to 63 hd44780_write_data 02 () to 62 hd44780_write_data 02 () to 61 hd44780_write_data 02 () to 60 hd44780_write_data 02 () to 5f hd44780_write_data 02 () to 5e hd44780_write_data 02 () to 5d hd44780_write_data 02 () to 5c hd44780_write_data 02 () to 5b hd44780_write_data 02 () to 5a hd44780_write_data 02 () to 59 hd44780_write_data 82 (�) to 58 hd44780_write_data 02 () to 57 hd44780_write_data 02 () to 56 hd44780_write_data 02 () to 55 hd44780_write_data 02 () to 54 hd44780_write_data 02 () to 53 hd44780_write_data 02 () to 52 hd44780_write_data 02 () to 51 hd44780_write_data 02 () to 50 hd44780_write_data 02 () to 4f hd44780_write_data 02 () to 4e hd44780_write_data 02 () to 4d hd44780_write_data 02 () to 4c hd44780_write_data 02 () to 4b hd44780_write_data 02 () to 4a hd44780_write_data 02 () to 49 hd44780_write_data 00 () to 48 hd44780_write_data 00 () to 47 hd44780_write_data 00 () to 46 hd44780_write_data 00 () to 45 hd44780_write_data 00 () to 44 hd44780_write_data 00 () to 43 hd44780_write_data 00 () to 42 hd44780_write_data 00 () to 41 hd44780_write_data 00 () to 40 hd44780_write_data 00 () to 27 hd44780_write_data 00 () to 26 hd44780_write_data 00 () to 25 hd44780_write_data 00 () to 24 hd44780_write_data 00 () to 23 hd44780_write_data 00 () to 22 hd44780_write_data 00 () to 21 hd44780_write_data 00 () to 20 hd44780_write_data 00 () to 1f hd44780_write_data 00 () to 1e hd44780_write_data 00 () to 1d hd44780_write_data 00 () to 1c hd44780_write_data 00 () to 1b hd44780_write_data 00 () to 1a hd44780_write_data 00 () to 19 hd44780_write_data 00 () to 18 hd44780_write_data 00 () to 17 hd44780_write_data 00 () to 16 hd44780_write_data 00 () to 15 hd44780_write_data 00 () to 14 hd44780_write_data 00 () to 13 hd44780_write_data 00 () to 12 hd44780_write_data 00 () to 11 hd44780_write_data 00 () to 10 hd44780_write_data 00 () to 11 hd44780_write_data 00 () to 12 hd44780_write_data 00 () to 13 hd44780_write_data 00 () to 14 hd44780_write_data 00 () to 15 hd44780_write_data 00 () to 16 hd44780_write_data 00 () to 17 hd44780_write_data 00 () to 18 hd44780_write_data 00 () to 19 hd44780_write_data 00 () to 1a hd44780_write_data 00 () to 1b hd44780_write_data 00 () to 1c hd44780_write_data 00 () to 1d hd44780_write_data 00 () to 1e hd44780_write_data 00 () to 1f hd44780_write_data 00 () to 20 hd44780_write_data 00 () to 21 hd44780_write_data 00 () to 22 hd44780_write_data 00 () to 23 hd44780_write_data 00 () to 24 hd44780_write_data 00 () to 25 hd44780_write_data 00 () to 26 hd44780_write_data 00 () to 27 hd44780_write_data 00 () to 40 hd44780_write_data 00 () to 41 hd44780_write_data 00 () to 42 hd44780_write_data 00 () to 43 hd44780_write_data 00 () to 44 hd44780_write_data 00 () to 45 hd44780_write_data 00 () to 46 hd44780_write_data 00 () to 47 hd44780_write_data 00 () to 48 hd44780_write_data 02 () to 49 hd44780_write_data 02 () to 4a hd44780_write_data 03 () to 4b hd44780_write_data 00 () to 4c hd44780_write_data 00 () to 14 hd44780_write_data 40 (@) to 15 hd44780_write_data a0 (�) to 16 hd44780_write_data a0 (�) to 17 hd44780_write_data a0 (�) to 18 hd44780_write_data a1 (�) to 19 hd44780_write_data 11 () to 1a hd44780_write_data 10 () to 1b hd44780_write_data 81 (�) to 65 hd44780_write_data c1 (�) to 66 hd44780_write_data 01 () to 67 hd44780_write_data 81 (�) to 00 hd44780_write_data 00 () to 47 hd44780_write_data e0 (�) to 48 hd44780_write_data e0 (�) to 49 hd44780_write_data e1 (�) to 4a hd44780_write_data f1 (�) to 4b hd44780_write_data f1 (�) to 4c hd44780_write_data f0 (�) to 4d hd44780_write_data 40 (@) to 4e ```

At some point after that Klipper throws a "timer scheduled in the past" error, but I suspect that once things have gone wrong any downstream errors are not necessarily related to the problem, just side effects. The display itself probably isn't even the initial issue either, but I'm not seeing any obvious faults further upstream in the Klipper log.

Thanks in advance for any insight you can provide!

log.txt

KevinOConnor commented 3 years ago

Interesting.

I suspect there are two separate issues - lcd init sequence and micro-controller timing. The micro-controller timing is likely to be the larger issue.

Klipper expects the micro-controller timing to be steady and reliable. The host code measures the micro-controller clock rate (by comparing it to the host machines system time) and expects that it can make generally accurate predictions about the micro-controller clock. In addition, the micro-controller expects that it will get a schedule of events that it can reliably run without falling behind by a significant amount. A "Rescheduled timer in the past" error indicates the micro-controller detected it could not keep up with its schedule of events. That's certainly an odd error to get during the initialization process, as the micro-controller has a very limited schedule at that point.

Some random hints:

  1. Grab the very latest code. The change for #3356 may help here.
  2. Reduce your printer.cfg to the simplest possible and then add features on top until you run into an issue. An example minimal config looks something like:
    
    [mcu]
    serial: /dev/ttyACM0

[printer] kinematics: none max_velocity: 300 max_accel: 1000

3. It's possible there is an error in the simavr timer emulation.  If so, that may require work on simavr itself.
4. If you get a klipper shutdown, it's possible to post-process that log - see: https://www.klipper3d.org/Debugging.html#extracting-information-from-the-klippylog-file

As for the LCD display - the hd44780 bit-banging interface is pretty cryptic.  Klipper uses a "hack" to make sure the display is in "4-bit mode".  You'll find that init code in klippy/extras/display/hd44780.py:
```python
        # Program 4bit / 2-line mode and then issue 0x02 "Home" command
        init = [[0x33], [0x33], [0x33, 0x22, 0x28, 0x02]]
        # Reset (set positive direction ; enable display and hide cursor)
        init.append([0x06, 0x0c])

I suspect that sequence is confusing the simulator and it gets out of sync with the 4bit mode. (Or, perhaps, the simulator gets stuck in 8-bit mode.) You could try simplifying the host init sequence to see if it helps (though I'd recommend tracking down the timing issues first).

-Kevin

vintagepc commented 3 years ago

Thanks for the info. I'll dig in.

FWIW the message OnCmdReadyactivating 4-bit mode in our log means the display did shift into 4-bit mode successfully. But I suspect once we find the cause of the timing issue it may fix the display communications as well; the simulated module has been fairly well vetted against the factory Prusa firmware. If there are timer issues I'm sure it would cause all sorts of weird side effects with regards to event scheduling and external communications.Thanks for pointing me to the init sequence; when (if) I get to that stage then I will be able to explicitly compare the bytes received to what's being sent.

Looks like we do have the issue somewhere in the SimAVR internals; as even just the bare config example has the same result.

I'm guessing that Resetting prediction variance 2216088.260: freq=18548748 diff=2484205 stddev=16360.986 means we are indeed not very consistent in the timing. I suspect it may be the way that sleep/idle time is handled; SimAVR will accumulate sleep cycles until we hit a certain threshold and then executes them to avoid simulation time and wall clock time diverging.

I've poked at the SimAVR timer stuff before, but some of it is voodoo to me, so I'll shake the tree and see what falls out. Are there any particular log outputs I should be looking at to determine if changes are going in the right direction? I'm thinking I want to reduce the variance on frequency as much as possible.

I'm seeing a lot of freq=18832616 in the output - we've been known to run a little fast; is that an issue so long as the time is consistent, or is Klipper expecting us to be closer than that to a configured frequency (16Mhz, in this case)

I did try post-processing a log but it didn't seem to do anything particularly useful for me apart from extracting out the config. Perhaps because the shutdown isn't caused by a bad config or printer setup.

Ironically enough I did look at simulavr as a base for MK404, but passed on it due to simavr being more feature-complete.

KevinOConnor commented 3 years ago

Looks like we do have the issue somewhere in the SimAVR internals; as even just the bare config example has the same result.

If you're getting a "Rescheduled timer in the past" error with the minimal config, that would generally indicate to me that there is something not compatible with the avr timer emulation.

I'm guessing that Resetting prediction variance 2216088.260: freq=18548748 diff=2484205 stddev=16360.986 means we are indeed not very consistent in the timing.

Yes - that's a warning from the host that it is not able to accurately predict micro-controller timing. It's just a warning though - it wouldn't directly lead to a "Rescheduled timer in the past". (Unstable host predictions could lead to other errors during an actual print though.)

Are there any particular log outputs I should be looking at to determine if changes are going in the right direction? I'm thinking I want to reduce the variance on frequency as much as possible.

Unsteady timing shouldn't result in a "Rescheduled timer in the past" error. That particular error is raised from src/avr/timer.c - it indicates there were multiple "software timers" present and after running at least one of them, there was another pending that was scheduled to occur more than 1ms in the past. That really shouldn't happen with a minimal config.

I'm seeing a lot of freq=18832616 in the output - we've been known to run a little fast; is that an issue so long as the time is consistent, or is Klipper expecting us to be closer than that to a configured frequency (16Mhz, in this case)

The Klipper host software is fine with any frequency as long as it is stable. For example, with simulavr I'll run the simulation at 20% speed just to make it have a stable frequency (using the -r .2 parameter of ./scripts/avrsim.py ). Also, sometimes the klipper host will "sync" after a minute or two - which should also be fine. So, if the warnings go away after some time then it's probably okay (at least for testing).

Ironically enough I did look at simulavr as a base for MK404, but passed on it due to simavr being more feature-complete.

FWIW, simulavr also has issues here - for example, I need to run it at a painfully slow rate (20%) to run basic simulations reliably.

-Kevin

vintagepc commented 3 years ago

Unsteady timing shouldn't result in a "Rescheduled timer in the past" error. That particular error is raised from src/avr/timer.c - it indicates there were multiple "software timers" present and after running at least one of them, there was another pending that was scheduled to occur more than 1ms in the past. That really shouldn't happen with a minimal config.

That's interesting... but helps clarify that this is purely a simulator issue and not an issue of synchronization or lag between the host and microcontroller.

I've seen some issues like that in the past - e.g. the SD bus clocking so slowly that it caused a WDR before the file processing was complete.

I can drop breakpoints in the AVR code and debug it via GDB - is there an easy way to identify what particular task the firmware was trying to do that took too long?

Thanks again for your assistance and for bearing with my questions. It'd be neat if Klipper could be made to work on a full-blown printer sim!

KevinOConnor commented 3 years ago

I can drop breakpoints in the AVR code and debug it via GDB - is there an easy way to identify what particular task the firmware was trying to do that took too long?

In a minimal config, I think there would only be two "software timers" on the timer list (src/sched.c:timer_list) - the periodic_timer (src/sched.c) and the wrap_timer (src/avr/timer.c). Both will reschedule themselves, but they only run every 100ms / 2ms respectively. Otherwise the code should almost always be spinning in the irq_wait() call in src/sched.c:run_tasks().

It'd be neat if Klipper could be made to work on a full-blown printer sim!

Indeed!

Separately, if getting simavr to run is too painful, maybe we could add code to src/simulator/ to directly output the "events" that MK404 needs?

-Kevin

vintagepc commented 3 years ago

Thanks - I think I've got the understanding I need now to dive in. The upside is everything is purely a software environment so I can make them do whatever I want quite easily and synchronously.

Separately, if getting simavr to run is too painful, maybe we could add code to src/simulator/ to directly output the "events" that MK404 needs?

Interesting idea - though I don't think that fits with the project. MK404 is written very much as a source-to-endpoint simulator; Sure, it would likely be possible to add a klipper "core" which replaces the lowest levels and interacts with the virtual hardware - but the primary goal of the project is very much simulation at the lowest level possible all the way up; it runs unmodified .hex or .elf files of the factory firmware or Marlin (or, in theory, most any other binary meant to run on the virtual board - present company excepted, for now :wink: ) Swapping out the middle layers makes it a lot harder to debug or compare two firmwares' behaviour on the same virtual hardware stack, which is the end goal of the issue for which I'm trying to add this support.

But I won't deny there might be merit in that and we can always revisit the idea if my efforts don't pan out.

vintagepc commented 3 years ago

Making some progress... I see the core reason the LCD is out of sync. The E pin is getting toggled multiple times, at least for the simulation.

image

image

This looks like it just needs to configure E with the internal pullup. Fair enough, at least, until I tried to do that.

Klipper.py balks and isn't happy:

error: Invalid pin description '^PF7'
Format is: [chip_name:] pin_name

The failure to boot seems to be an IRQ enable/disable handling issue; I padded out the ISR with a debugging function and it no longer throws the timer-in-the-past error; There are some subtleties there I've hacked around but not quite explored in depth yet.

In any case, manually double-desyncing the LCD actually got me a usable menu.

Any idea why it won't accept the ^ for that particular pin? there are others in the .cfg where it works just fine.

KevinOConnor commented 3 years ago

Any idea why it won't accept the ^ for that particular pin?

The "e_pin" is configured as an output pin. Klipper only accepts pull-ups for input pins.

FWIW, I don't see why the e pin toggling would be incorrect. Klipper sets it high for the duration of the 4bit transmit (src/lcd_hd44780.c:hd44780_xmit_bits() ).

-Kevin

vintagepc commented 3 years ago

Yeah, that occurred to me later >_<. Hazards of software vs what hardware is really capable of...

I'm digging in now that I have a thread to pull at.

vintagepc commented 3 years ago

Ah, that trace was a little misleading because E needs to toggle twice for each byte.

In any case, I think I have traced it back to the init sequence. Where did you get the particular one you're using?

The more common one I am aware of is sending three "3" nibbles, followed by a 2, and then the desired config (2 lines, etc.)

This change fixes the issue for me: (changing 0x22 to 0x32). init = [[0x33, 0x32, 0x28, 0x02]] _________________^

and the first two 0x33s are superfluous.

I haven't confirmed it yet but I suspect the reason it works as-is on the real display is this:

ProcessWrite command write when still BUSY

Where the simulated display just warns you and still processes the write anyway. (I suspect if I adjust the code slightly to completely ignore the incoming data while busy it would work with the original init sequence.) Edit: confirmed

image

KevinOConnor commented 3 years ago

The sequence is that way because the mcu does not implement a delay between nibbles - only between bytes. So, if the display wont process commands when busy, the command sequence: [[0x33], [0x33], [0x33, 0x22, 0x28, 0x02]] is effectively seen as 0x3, 0x3, 0x3, 0x2 and then 0x28, 0x02. However, if the simulator doesn't care about the busy state, I guess it's being seen as: 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x2, and then 0x22, 0x80, 0x2?, ... which is of course undesirable.

It might be worthwhile exploring a change to: [[0x33], [0x33], [0x32, 0x22, 0x28, 0x02]] which may work okay in both situations.

-Kevin

vintagepc commented 3 years ago

Ah, ok - that makes sense. I hadn't looked into the ndelay function to see it does nothing at < 48 MHz.

The right thing to do is fix the simulated hardware to behave more like the real thing, of course! :+1:

vintagepc commented 3 years ago

Starting to pull the timer issue thread - I'll post brief updates as I make findings; please don't feel obligated to reply but I figure it makes sense in case I observe something that you know is unexpected behaviour that might not be obvious to me.

Took me a bit of time to get a debug build with full source symbols, but now that I have it, this is the local ISR state when it throws the shutdown:

(printout of state each time we call the ISR)
diff = 227
now = 249
next = 0
diff = 8365
now = 8387
next = 0
diff = 16503
now = 16525
next = 0

Program received signal SIGTRAP, Trace/breakpoint trap.
__vector_17 () at src/avr/timer.c:200
200                     asm("break");
(gdb) info locals
diff = 16503
now = 16525
next = 0

A next value of 0 seems suspicious.

KevinOConnor commented 3 years ago

The wrap_timer (src/avr/timer.c:timer_event() ) should be called every 0x8000 ticks - so you should see the next sequence 0x0000, 0x8000, 0x0000, 0x8000, ...

Does avrsim properly emulate the TIFR1 TOV1 bit?

-Kevin

vintagepc commented 3 years ago

TOV1 certainly seems to be toggling as expected.

I had my debugging setup print out t[0] on every pass through the timer_sched call, and func only seemed to be pointing to periodic_timer, none to timer_event. I'm not entirely sure whether that's me interpreting the debug results correctly or not, but it is the next thread to pull.

vintagepc commented 3 years ago

This is interesting... at some point timer_event stops updating next. Naturally, that diverges from now() and we throw the error once that diverges more that 16000 cycles.

The button_events is a red herring; with the bare minimum config, the same behaviour still occurs, but it takes a lot longer to happen.

This is very interesting, because it smells of stack/memory corruption of some kind.

full config trace of t[0] in sched call and TIFR1 from perspective of timer_event ``` 0x800036: 0x0000000d $564 = {next = 0x800bf6, func = 0x580e , waketime = 16941056} 0x800036: 0x00000000 $577 = {next = 0x800bf6, func = 0x580e , waketime = 17367040} 0x800036: 0x0000000d $578 = {next = 0x800bf6, func = 0x580e , waketime = 17399808} 0x800036: 0x00000000 $579 = {next = 0x800bf6, func = 0x580e , waketime = 17432576} 0x800036: 0x0000000d $580 = {next = 0x80021b, func = 0x46a8 , waketime = 17440000} $581 = {next = 0x800bf6, func = 0x580e , waketime = 17465344} 0x800036: 0x00000000 $582 = {next = 0x80021b, func = 0x46a8 , waketime = 17472000} $583 = {next = 0x800bf6, func = 0x580e , waketime = 17498112} 0x800036: 0x0000000d $584 = {next = 0x80021b, func = 0x46a8 , waketime = 17504000} $585 = {next = 0x800bf6, func = 0x580e , waketime = 17530880} 0x800036: 0x00000000 $586 = {next = 0x80021b, func = 0x46a8 , waketime = 17536000} $587 = {next = 0x800bf6, func = 0x580e , waketime = 17563648} 0x800036: 0x0000000d $588 = {next = 0x80021b, func = 0x46a8 , waketime = 17568000} $589 = {next = 0x800212, func = 0x580e , waketime = 17596416} 0x800036: 0x00000000 $590 = {next = 0x800bf6, func = 0xfac , waketime = 17600000} $591 = {next = 0x80021b, func = 0x46a8 , waketime = 17600000} $592 = {next = 0x800bf6, func = 0x580e , waketime = 17629184} 0x800036: 0x0000000d $593 = {next = 0x80021b, func = 0x46a8 , waketime = 17632000} $594 = {next = 0x800bf6, func = 0x580e , waketime = 17661952} 0x800036: 0x00000000 $595 = {next = 0x80021b, func = 0x46a8 , waketime = 17664000} $596 = {next = 0x800bf6, func = 0x580e , waketime = 17694720} 0x800036: 0x0000000d $597 = {next = 0x80021b, func = 0x46a8 , waketime = 17696000} $598 = {next = 0x800bf6, func = 0x580e , waketime = 17727488} 0x800036: 0x00000000 $599 = {next = 0x80021b, func = 0x46a8 , waketime = 17728000} $600 = {next = 0x80021b, func = 0x46a8 , waketime = 17760000} $601 = {next = 0x800bf6, func = 0x580e , waketime = 17760256} 0x800036: 0x00000000 $602 = {next = 0x800bf6, func = 0x580e , waketime = 17760256} 0x800036: 0x00000000 $603 = {next = 0x800bf6, func = 0x580e , waketime = 17760256} 0x800036: 0x00000000 $667 = {next = 0x800bf6, func = 0x580e , waketime = 17760256} 0x800036: 0x00000000 $668 = {next = 0x800bf6, func = 0x580e , waketime = 17760256} 0x800036: 0x00000000 Program received signal SIGTRAP, Trace/breakpoint trap. __vector_17 () at src/avr/timer.c:200 200 asm("break"); (gdb) ```
Trace with bare minimum config ``` 0x800036: 0x00000000 $7039 = {next = 0x800212, func = 0x580e , waketime = 204603392} 0x800036: 0x0000000d $7040 = {next = 0x800212, func = 0x580e , waketime = 204636160} 0x800036: 0x00000000 $7041 = {next = 0x800212, func = 0x580e , waketime = 204668928} 0x800036: 0x0000000d $7042 = {next = 0x800212, func = 0x580e , waketime = 204701696} 0x800036: 0x00000000 $7043 = {next = 0x800212, func = 0x580e , waketime = 204734464} 0x800036: 0x0000000d $7044 = {next = 0x800212, func = 0x580e , waketime = 204767232} 0x800036: 0x00000000 $7045 = {next = 0x80021b, func = 0xfac , waketime = 204800000} $7046 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7047 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7048 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7049 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7050 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7051 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7052 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7053 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7054 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7055 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7056 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7057 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7058 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7059 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7060 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7061 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7062 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7063 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7064 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7065 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7066 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7067 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7068 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7069 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7070 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7071 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7072 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7073 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7074 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7075 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7076 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7077 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7078 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7079 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7080 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7081 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7082 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7083 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7084 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7085 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7086 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7087 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7088 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7089 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7090 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7091 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7092 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7093 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7094 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7095 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7096 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7097 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7098 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7099 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7100 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7101 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7102 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7103 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7104 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 $7105 = {next = 0x800212, func = 0x580e , waketime = 204800000} 0x800036: 0x00000000 Program received signal SIGTRAP, Trace/breakpoint trap. __vector_17 () at src/avr/timer.c:200 200 asm("break"); (gdb) ```
KevinOConnor commented 3 years ago

I'd guess the TIFR1 handling is not correct. Looking briefly at https://github.com/buserror/simavr/blob/5c757546298c4c26022b3b4861e1bbfbdb72ac52/simavr/sim/avr_timer.c#L707 , I don't see it actually using the v value. It should clear the overflow flag iff there is a 1 in the TOV1 bit. Of course, I don't know enough about simavr to really say.

-Kevin

vintagepc commented 3 years ago

I don't think that's it; I added some logging to check if TOV was getting written but not cleared by the time the function finishes; the avr_clear_interrupt_if() seems to be working as expected in this case.

KevinOConnor commented 3 years ago

Okay. FWIW, my concern was that timer_repeat_set() (and similar) that write to other bits in TIFR1 would incorrectly clear TOV1.

-Kevin

vintagepc commented 3 years ago

Fair enough - I did also check whether we were incorrectly clobbering it if a 0 was written, but that didn't appear to be the case either.

vintagepc commented 3 years ago

Looks like you were on to something; it does look like timer_repeat_set's OCF1B was clobbering TOV1.

My oversight; I had set up tracing/logging in such a way that this was transparent to me and it looked like a legit TOV clear until I started inspecting where the clear originated from. It was further compounded by the event setup, which reads as: avr_register_io_write(avr, p->overflow.raised.reg, avr_timer_write_pending, p);

so on casual glance it's easy to assume it only calls that function on overflow write.

Now it boots and stays alive... On to the next issue! (Timer too close when doing something like homing or heating).

vintagepc commented 3 years ago

Closing this, since the original issue is resolved. Please don't hesitate to reach out if you're interested in using MK404 in some official capacity; I'm sure we can work something out :+1: