buserror / simavr

simavr is a lean, mean and hackable AVR simulator for linux & OSX
GNU General Public License v3.0
1.59k stars 369 forks source link

Program Counter breaks after Interrupt Request #443

Closed magdutra closed 3 years ago

magdutra commented 3 years ago

I'm writing a new code in assembly (gnu as compiler) for the ATmega644p and using gnu-avr and simavr for debugging.

A few instructions after initializing Timer0 and enabling interrupts with "sei", the program counter resets to the begining of the routine instead of proceeding to the next instruction (this same routine is called correctly 3 times before that).

The problem occurs exactly at the same point every time, but only if interrupts are enabled with "sei". It also doesn't matter the contents of the interrupt handler; I tried putting a "reti" directly in the interrupt vector, but the problem persisted.

Forgive me if this is an error of my own, but I spent quite a number of hours seeking a solution, to no avail.

gatk555 commented 3 years ago

That sounds very odd. Does it happen without using gdb? Can you post example code for reproduction?

magdutra commented 3 years ago

Thank you for your reply! Unfortunately, I don't know how to run code in the simulator without using gdb.

However, I've noticed that the program counter is being loaded in the stack right before its value changes. I imagine this is the simulator triggering the interrupt, but pointing the PC to the wrong address, maybe?

I've tried the same test using simulavr for comparison, and I get a similar behavior, so I think the odds are that this is my mistake, instead of a bug.

If you'd like, I can post some example code. I can also close the case for now, until I get a better understanding of what's happening.

Thank you again.

gatk555 commented 3 years ago

Assuming you are using run-avr, you should be able to run without gdb and still see what is happening by enabling instruction-level tracing. The option for it does not work unless you modify simavr/simavr/Makefile to enable it and recompile: there is a commented line to be uncommented.

But given that you can reproduce in another simulator, it seems unlikely to be a bug. If you post code, at least one person will look at it, and that often helps.

On Fri, 9 Apr 2021 at 00:25, Miguel Dutra @.***> wrote:

Thank you for your reply! Unfortunately, I don't know how to run code in the simulator without using gdb.

However, I've noticed that the program counter is being loaded in the stack right before its value changes. I imagine this is the simulator triggering the interrupt, but pointing the PC to the wrong address, maybe?

I've tried the same test using simulavr for comparison, and I get a similar behavior, so I think the odds are that this is my mistake, instead of a bug.

If you'd like, I can post some example code. I can also close the case for now, until I get a better understanding of what's happening.

Thank you again.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/buserror/simavr/issues/443#issuecomment-816297940, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAYEBDFP4MHDXFYV6N7OY3THY3OLANCNFSM42RRJZKA .

magdutra commented 3 years ago

Thank you for your suggestion. I will do a recompilation of simavr to be able to make this test with avr-run.

For now, here goes a code dump with comments modified by me for improved readability. This should be more convenient than providing a few source files, I suppose:

https://github.com/buserror/simavr/files/6289368/dump.txt

A few comments about what I'm trying to achieve: I'm implementing something similar to a __cdecl calling convention, i.e., instead of directly calling my functions with "rcall" or "call", I'm using a macro that pushes parameters in the stack and then calls a fixed address at 0x0040, where I have dispatcher function in place. When it returns, the macro fixes the stack by adding to it the number of bytes previously pushed.

The role of the dispatcher function is to look up a table containing the addresses of each public function and then call the one requested using the parameters provided on the stack (the first two parameters are the module index and function index to be invoked).

The first "module" that I implemented is the avr's Timer0, with two functions: one to initialize it (TmrInit) and another one that causes a delay between 1ns and 6,5s (TmrWait).

To test all this, I created a second module that makes a led blink every 0,5s using TmrWait and the dispatcher. Everything works, as long as interrupts are disabled. If the Global Interrupt Flag is enabled ("sei"), then a few moments later the program counter is redirected to the address 0x0040.

magdutra commented 3 years ago

Well, I did it! And the instruction tracing is absolutely marvelous! Now I have a detailed description of what I've been trying to describe.

The corruption is occurring at the end of the "GetVector" routine (0x76 to 0x86). This block is called two times before the "sei" instruction, then two more times after it. However, in the second round (see line 242) the program counter jumps to the address 0x0040 and starts processing 'push' instructions from the beginning of the "SysCall" routine.

What could explain such behavior?

https://github.com/buserror/simavr/files/6289494/trace.txt

gatk555 commented 3 years ago

That made an entertaining puzzle. I am almost certain that it is a simavr bug, this line in simavr/simavr/cores/sim_megax4.h:

DEFAULT_CORE(4),

That defines a processor with 4-byte interrupt vectors, but the data sheet and your code say 2. So when the interrupt happens, the wrong vector address is used. It is interesting that you saw the same thing in another simulator. Has it the same bug?

One thing I noticed while looking at this: your code loads the SP with interrupts enabled. I believe that is not safe.

magdutra commented 3 years ago

Just to give you confirmation, I applied your correction and now it is working like a charm. Thank you very much for your help!

gatk555 commented 3 years ago

Sadly, that is the wrong fix. The right one is for you to insert a nop after each instruction in the interrupt vector area. I was caught out by the fact that simavr uses byte addressing for flash, but in the hardware the addresses are words.

magdutra commented 3 years ago

You don't know what a relief is to learn that. The code was working in the simulator, but not in the actual device. I've undone the patch now and applied your new suggestion. It's working on both now!

The problem is that the datasheet is very confusing in this regard: the item 9.2, shows an interrupt table with "Program Address" going from 0x0000, 0x0002, 0x0004 etc. (i.e., 2 bytes) while at the same time showing a "jmp" instruction, which uses four bytes.

I'm used to working with Tiny AVRs, but it is the first time I'm using a Mega, so I trusted my gut and replaced the "jmp" for "rjmp". That caused all this confusion. Sorry about that.