buserror / simavr

simavr is a lean, mean and hackable AVR simulator for linux & OSX
GNU General Public License v3.0
1.58k stars 368 forks source link

RXC interrupt will not fire when UCSRnA is not polled #104

Closed distributed closed 7 years ago

distributed commented 10 years ago

In avr_uart_rxc_read(), registered as an IO write handler for UCSRnA, a pair of XOFF/XON IRQs is raised if RXENn is set and the input FIFO is empty. If the simulated firmware does not read UCSRnA, no such IRQ pair is generated.

In uart_pty, just in front of uart_pty_xon_hook() there is a comment

/*
 * Called when the uart has room in it's input buffer. This is called repeateadly
 * if necessary, while the xoff is called only when the uart fifo is FULL
 */

Clearly this is not the case if firmware is not polling UCSRnA. Thus some of my firmwares using RXC interrupts will not work ;)

I'm not quite sure on how to proceed. Would it make sense to periodically raise XON IRQs to activate UART I/O implementations running in their own threads? Are there other solutions?

I have implemented a proof-of-concept fix. When RXENn and RXCIEn are set, I set up a periodic timer firing every p->usec_per_byte us that will raise a XOFF/XON interrupt pair. That way, only firmwares setting RXCIE will pay the execution time penalty. The timer period is updated on UBRR writes. Are you interested in merging this?

buserror commented 10 years ago

My assumption is that if you haven't got an explicit XOFF, it means you can send IRQ from your thread, and it will then trigger an interrupt from the firmware. It appears your assumption is that you /need/ a XON, aka that simavr 'drives' the sending of the bytes to the fifo -- it's not the case, as long as you don't get an XOFF, you are OK sending bytes -- the start condition is FIFO empty.

If simavr had to XON for you to put a byte in, it'd be about as good as having a single byte fifo, which would be a lot less efficient.

distributed commented 10 years ago

I agree that what you sketch is a sane and non-surprising protocol for flow control: raise an XON, the other side may send a number of bytes, but as as soon you raise an XOFF, the other side has to stop.

It is not so much an assumption of me as that in uart_pty.c I read the following

/*
 * Called when the uart has room in it's input buffer. This is called repeateadly
 * if necessary, while the xoff is called only when the uart fifo is FULL
 */
static void
uart_pty_xon_hook(

My point is that this routine is not necessarily called repeatedly.

The only place where this XON is raised is in avr_uart.c in avr_uart_rxc_read. This routine only gets called when firmware reads UCSRnA. A firmware that turns on RXCIEn and just waits for a byte to arrive and the RXCn IRQ to be asserted will fail. I have a test case showing this, if you are interested.

Coming back to the XON/XOFF protocol between the uart and the concrete I/O implementation[1], I understand that you feel having "periodic pumping" in the uart part is not the solution. In the case of uart_pty.c we have two threads at our hands. Thread A is executing the AVR simulation and thus also the code in avr_uart.c. Thread B is looping in uart_pty_thread(), doing a select() on the pty fd. As long as there is space in the out FIFO, B reads bytes from the pty and puts them into said FIFO. When firmware is reading UCSRnA, as you note probably because it's polling for data, Thread A raises an XOFF/XOFF sequence in avr_uart_rxc_read, running a bit contrary to your sketch of the XON/XOFF protocol. Because Thread A raises the XON irq, data is pumped out of the uart_pty.c out FIFO by uart_pty_xon_hook running in thread A. This design makes it possible to pump sequences of multiple bytes from B to A in one call.

When the firmware uses the RXCn interrupt exclusively and does not poll UCSRnA, this scheme falls down, as the bytes arriving from the host are never pumped from B to A. I feel that if firmware uses RXCn interrupts, bytes should somehow still get pumped from B to A. What do you think?

[1] What's your jargon for the two parts, the processor facing one in avr_uart.c and the host facing I/O implementation like uart_pty.c? :)

bsekisser commented 9 years ago

Just did some digging and responded to a comment on the simavr google group... wasn't sure if you were the same person or not so wanted to drop a line here too as the the issue here and what was posted there seems strongly related.

https://github.com/bsekisser/simavr/tree/bsekisser-uart-pty-fix

distributed commented 9 years ago

I suppose you are referencing this?

I am not the same person as the OP in the above link. Judging from the description, we are speaking about the same issue, though.

Concerning the fix you linked. You are calling avr_raise_irq in uart_pty_thread, not in the "main" AVR execution thread. I don't think this is safe. avr_raise_irq is not thread sae.

As threads cannot interrupt one another, I believe there needs to be a defined synchronization point between the PTY thread and the main thread. One idea, mentioned above in this issue, is polling the PTY FIFO periodically (for example, every p->usec_per_byte) to get the data flowing.

bsekisser commented 9 years ago

The suspicion of the implementation not being thread safe is the reason I did not post as a pull request. Unfortunately Busserror is right, but so too it is also evident that the uart_pty implementation is broken. More so, I am not sure there is a clean way to do what needs to be done here... so far I am struggling to find a trigger point to work off of... finding a clean way to use a cycle timer may be the best option, at least until somebody else comes up with a viable solution. I'll see if I can come up with something.

On Tue, 2014-11-18 at 00:41 -0800, Michael Meier wrote:

I suppose you are referencing this?

I am not the same person as the OP in the above link. Judging from the description, we are speaking about the same issue, though.

Concerning the fix you linked. You are calling avr_raise_irq in uart_pty_thread, not in the "main" AVR execution thread. I don't think this is safe. As threads cannot interrupt one another, I believe there needs to be a defined synchronization point between the PTY thread and the main thread. One idea, mentioned above in this issue, is polling the PTY FIFO periodically (for example, every p->usec_per_byte) to get the data flowing.

— Reply to this email directly or view it on GitHub.

distributed commented 9 years ago

Do you think the uart_pty implementation is broken?

I believe simavr strives for kinda-realtime simulation where software on the host can communicate with the simulated AVR firmware as if it were running on actual hardware. See the board_simduino that can be used with an off-the-shelf avrdude running over a PTY UART. I'm using simavr to develop an AVR monitor/gdb stub, working with an off-the-shelf gdb communicating via the PTY UART implementation. It's pretty neat we can do that, I think :)

Therefore, you will have a thread that is constantly busy executing the AVR simulation. If you want to have I/O with the host environment you have two options:

1.) Use just one thread. That means periodically doing a select with zero timeout on all I/O channels. 2.) Use multiple threads, one to execute the AVR simulation, the others to do the I/O.

Number 1.) has a myriad of issues. It means that you can only use I/O that supports select. It means that you have zero or near-zero timeout selects in your simulation loop. This is hardly highly performant. If you choose to call select less often, the latency (measured in AVR cycles) of all your of all your I/O will rise. You need to have all your I/O registered with the one central select calling routine, or pay the price of making multiple select calls after one another.

Number 2.) seems to be what simavr, or at least the simavr examples use. You get to do I/O in any way you want, because you can do it in separate threads. There are no selects in your main simulation loop. Of course there needs to be a criterion to get data from your I/O threads to your simulation thread.

In case of polled UART reception, this works out elegantly, as you can use every read of UCSRnA to poll if your I/O has data. Neat: you use the firmware's behavior as a hint for the simulation environment to poll. When the firmware gives you no hints, say because it's relying on interrupts being raised, you can't do that. The I/O thread can't interrupt the AVR simulation thread. The simulation threads needs to realize that something has happened (say, arrival of a byte) on its own. So I guess either you

a.) Set up periodic timers for devices like the UART to poll I/O threads. b.) Set up some sort of global queue (containing sim IRQs to be raised, for example) and pick items off the queue in the simulation loop.

Essentially, you're polling either way.

What I proposed in the original post of this issue is a.). I like it because you only pay the polling overhead when your code has RXCn interrupts turned on. When your firmware uses RXCn interrupts, you want them to work and I think it's fair to pay a price for this. When you don't use them, no additional overhead is generated.

I realize that this is an issue that crops up in a number of places, as such it might make sense to seek a uniform solution. I would be very glad to hear @buserror's opinion.

ve3wwg commented 9 years ago

I'm jumping in late to this party, but since I work with thread intensive processes at work and have a recent interest in simavr, I'll offer my own two cents worth..

When working with threads, you almost always end up having to coordinate between them at some point. This requires the use mutexes and condition variables (when necessary).

In the AVR simulation, you already have a main loop that you essentially sim one instruction at a time. If you need to coordinate with an I/O thread, you lock a mutex after returning from the instruction sim, check some state and unlock when done and repeat. Even if you arrange the avr sim to "poll" something, with the use of threads you'll need to manage state in an atomic way in some piece of the code (usually involving mutexes).

Threads seem a natural way to divide between one or more I/O device sims and the AVR instruction sim.

The problem then translates into how to package the thread specific code in a portable way (Win vs POSIX).

Warren

On Tue, Nov 18, 2014 at 9:32 AM, Michael Meier notifications@github.com wrote:

Do you think the uart_pty implementation is broken?

I believe simavr strives for kinda-realtime simulation where software on the host can communicate with the simulated AVR firmware as if it were running on actual hardware. See the board_simduino that can be used with an off-the-shelf avrdude running over a PTY UART. I'm using simavr to develop an AVR monitor/gdb stub, working with an off-the-shelf gdb communicating via the PTY UART implementation. It's pretty neat we can do that, I think :)

Therefore, you will have a thread that is constantly busy executing the AVR simulation. If you want to have I/O with the host environment you have two options:

1.) Use just one thread. That means periodically doing a select with zero timeout on all I/O channels. 2.) Use multiple threads, one to execute the AVR simulation, the others to do the I/O.

Number 1.) has a myriad of issues. It means that you can only use I/O that supports select. It means that you have zero or near-zero timeout selects in your simulation loop. This is hardly highly performant. If you choose to call select less often, the latency (measured in AVR cycles) of all your of all your I/O will rise. You need to have all your I/O registered with the one central select calling routine, or pay the price of making multiple select calls after one another.

Number 2.) seems to be what simavr, or at least the simavr examples use. You get to do I/O in any way you want, because you can do it in separate threads. There are no selects in your main simulation loop. Of course there needs to be a criterion to get data from your I/O threads to your simulation thread.

In case of polled UART reception, this works out elegantly, as you can use every read of UCSRnA to poll if your I/O has data. Neat: you use the firmware's behavior as a hint for the simulation environment to poll. When the firmware gives you no hints, say because it's relying on interrupts being raised, you can't do that. The I/O thread can't interrupt the AVR simulation thread. The simulation threads needs to realize that something has happened (say, arrival of a byte) on its own. So I guess either you

a.) Set up periodic timers for devices like the UART to poll I/O threads. b.) Set up some sort of global queue (containing sim IRQs to be raised, for example) and pick items off the queue in the simulation loop.

Essentially, you're polling either way.

What I proposed in the original post of this issue is a.). I like it because you only pay the polling overhead when your code has RXCn interrupts turned on. When your firmware uses RXCn interrupts, you want them to work and I think it's fair to pay a price for this. When you don't use them, no additional overhead is generated.

I realize that this is an issue that crops up in a number of places, as such it might make sense to seek a uniform solution. I would be very glad to hear @buserror https://github.com/buserror's opinion.

— Reply to this email directly or view it on GitHub https://github.com/buserror/simavr/issues/104#issuecomment-63478881.

buserror commented 9 years ago

Mutex are the bane of computer science, and has been for many years. Mutexes are a stupid way to do a task on multiple cores/threads. It was a reasonably good ideas when people had only one core, but the more core you add, the more the mutexes become THE bottleneck in a system.

The best way to handle the problem of calling raise_irq in a separate thread is... not to do it. You can post a request to a fifo (without a lock) and signal the main thread using just a variable; since the main thread wakes up very regularly to execute instruction, and it has to do is pick up requests from the FIFO and /there and then/ call raise_irq.

It's a simple 'runloop' system that will not block anyone from doing any work.

That way you have isolated the dependency between threads in ONE single non blocking piece of code, instead of adding mutexes everywhere in the 'hope' you caught all the places that needs to lock it.

distributed commented 9 years ago

@ve3wwg, thanks for your input. I believe that our descriptions basically break down to the same thing.

My variants a.) and b.) described above are two mechanically slightly different version of coordinating between two threads.

I have been speaking of "polling for I/O" as simavr typically uses lock free FIFOs to transfer data between threads in a thread safe manner. My terminology was a bit sloppy :)

Speaking about the UART implementation specifically, I like a.) because the overhead of polling the I/O thread's FIFO only applies when RXCn interrupts are active and that the checks happen relatively seldom, about every character time.

I suspect that there are no mutexes or condvars (which in pthread have a mutex of their own) in the main simulation loop because they are expected to slow execution speed down a lot.

ve3wwg commented 9 years ago

Michel, you are entitled to your opinion, but mutexes are usually necessary in threaded processes.

On Tue, Nov 18, 2014 at 10:45 AM, Michel Pollet notifications@github.com wrote:

Mutex are the bane of computer science, and has been for many years.

And who says that? Like anything else, performance programming requires responsibility.

Before threads (on Unix), we had semaphores, shared memory and message queues. These are heavy handed solutions compared to threads and mutexes. A huge effort went into putting threads into Unix (/Linux) for the very reason of performance. Mutexes were an integral part of the pthreads implementation.

Yes. mutexes can be abused, just like anything else. If I hand out keys to a Ferrari, someone can wrap it around a tree in minutes. A tool needs to be used responsibly.

Mutexes are a stupid way to do a task on multiple cores/threads. It was a reasonably good ideas when people had only one core,

You have this backwards. Mutexes are totally unnecessary with one core :)

Performance wise, the implementation of mutexes are excellent on Linux (implemented as a very low cost FUTEX). On Windows (my least favourite platform), they are used extensively due to the wide use of threads on that platform.

To toss out mutexes is to essentially discard threads or at the minimum delegate them to completely isolated tasks.

but the more core you add, the more the mutexes become THE bottleneck in a system.

Performance requires responsibility. To use a blanket statement that they are "stupid" is simply short sighted. The glibc library uses them in the stdio print routines for example. :)

The best way to handle the problem of calling raise_irq in a separate thread is... not to do it. You can post a request to a fifo (without a lock) and signal the main thread using just a variable; since the main thread wakes up very regularly to execute instruction, and it has to do is pick up requests from the FIFO and /there and then/ call raise_irq.

It's a simple 'runloop' system that will not block anyone from doing any work.

That sounds like a workable solution.

That way you have isolated the dependency between threads in ONE single non blocking piece of code, instead of adding mutexes everywhere in the 'hope' you caught all the places that needs to lock it.

To be clear, you would only need ONE mutex here. There is no need to imply the worst possible scenario that you can imagine.

Warren

bsekisser commented 9 years ago

As per what Buserror said on having to pump data to the uart still stands... no good way to really have simavr hold your hand for you there (at least that I could find)...

Best I can tell, simavr is doing what it should when data is pumped into the uart (more on that below)... RXC interrupt is called when the uart gets data, same with the UDRE interrupt.

As per uart_pty... not sure as I do not completely know how it was originally intended to operate (nor have I figured out how to make it work without manually pumping the data as stated below)... to modify my first attempt at making it work for me, I added uart_pty_pump to be called in the application avr run loop (so it only affects those using it) and got it working with exception that it double echoed... taking out the tap.out to tap.in fixed that.

part of me thinks that devising a way to pump data with irqs might be usefull but would require a bit of work to make it happen... again, not sure if there would be a clean way of doing it without impacting the emulation speed... though for those that could use it may in fact see a speed benefit (at least for uart_pty, as having to constantly call a function to pump the data has its own impact)

buserror commented 9 years ago

actually, the interrupt 'raising' is what seems to matter, as if the handler is called, the register is polled, and the data pump gets going..

The current interrupt delivery already use a small FIFO (that probably need to be converted to a fifo_declare.h one in fact). It is used by simavr itself internally, so you /can't/ just use it, however, you could add a /second/ fifo in avr_int_table_t dedicated to raising interrupts /externally/ -- it would have to be limited to a single external thread of course, but the only overhead for the avr core would be to poll the write flag for both of the fifos...

That way, when you decide to send a byte to simavr, you avr_raise_interrupt_external(*avr, uint8_t int_n); and you get a thread safe way of 'forcing' the core to call your IRQ handler once the avr firmware reads the register...

distributed commented 9 years ago

@buserror Using the new second FIFO would require one thread consolidating the I/O from all the I/O threads. Is that correct?

Concerning the original issue: Would you like me to provide a workaround for the time being?

I do not feel comfortable making the abovementioned IRQ FIFO changes to your code base :)

bsekisser commented 9 years ago

Let me get some clarification of the issue if I may...

uart_pty? are you using/dependant on this?.. not sure if that was the case or just used as a reference.

Does the physical avr repeatedly call the rxc interrupt? If I put data in the udr register udre/rxc interrupt gets called and I discovered the udr interrupt is called repeatedly in simavr and best I can tell rxc is only supposed to trigger once... so the simulated behavior seems to be correct. I modified the uart_pty branch to reflect with that in mind.

On Thu, 2014-11-20 at 06:07 -0800, Michael Meier wrote:

@buserror Using the new second FIFO would require one thread consolidating the I/O from all the I/O threads. Is that correct?

Concerning the original issue: Would you like me to provide a workaround for the time being?

I do not feel comfortable making the abovementioned IRQ FIFO changes to your code base :)

— Reply to this email directly or view it on GitHub.

distributed commented 9 years ago

Yes, I am dependent on uart_pty, as I mentioned above

I'm using simavr to develop an AVR monitor/gdb stub, working with an off-the-shelf gdb communicating via the PTY UART implementation.

I don't understand your question concerning the RXC interrupt.

The physical AVR behaves as per the data sheet.

Concerning the RXCn bit it says:

This flag bit is set when there are unread data in the receive buffer and cleared when the receive buffer is empty (i.e., does not contain any unread data). The RXCn Flag can be used to generate a Receive Complete interrupt (see description of the RXCIEn bit).

Concerning the RXCIEn bit:

Writing this bit to one enables interrupt on the RXCn Flag. A USART Receive Complete interrupt will be generated only if the RXCIEn bit is written to one, the Global Interrupt Flag in SREG is written to one and the RXCn bit in UCSRnA is set.

This is not how simavr implements the RXC interrupt. I have been bitten by this when developing my gdb stub and I was planning on opening another issue after this one gets resolved.

bsekisser commented 9 years ago

Okay, if you have not, try the latest update to the uart_pty branch... It seems that the UDRE interrupt triggers a call to one of the handlers repeatedly and I moved the pump function to the top and called that... not sure if something similar can be done for RXC and still trying to understand how that mechanism works.

I put code in the UDRE and RXC to set a port line on entry and then clear it before leaving... I see UDRE toggling the line repeatedly and the RXC gets toggled once... hence my questions and being puzzled.

As per datasheets... Physical behavior should match as descripted but I accept that in some instances subtle differences can and do occur.

I looked to see if you had a branch with the changes you were talking about to see what you had come up with and wondering if something could be done to make a solution acceptable to all... One of which was to use the flag mechanism to enable disable some sort of timer like you stated.

Sorry for not getting back to you sooner.

On Sat, 2014-11-22 at 02:16 -0800, Michael Meier wrote:

Yes, I am dependent on uart_pty, as I mentioned above

    I'm using simavr to develop an AVR monitor/gdb stub, working
    with an off-the-shelf gdb communicating via the PTY UART
    implementation. 

I don't understand your question concerning the RXC interrupt.

The physical AVR behaves as per the data sheet.

Concerning the RXCn bit it says:

    This flag bit is set when there are unread data in the receive
    buffer and cleared when the receive buffer is empty (i.e.,
    does not contain any unread data). 
    The RXCn Flag can be used to generate a Receive Complete
    interrupt (see description of the RXCIEn bit).

Concerning the RXCIEn bit:

    Writing this bit to one enables interrupt on the RXCn Flag. A
    USART Receive Complete interrupt will be 
    generated only if the RXCIEn bit is written to one, the Global
    Interrupt Flag in SREG is written to one and the RXCn bit in
    UCSRnA is set.

This is not how simavr implements the RXC interrupt. I have been bitten by this when developing my gdb stub and I was planning on opening another issue after this one gets resolved.

— Reply to this email directly or view it on GitHub.

distributed commented 9 years ago

I have not uploaded my changes as they are intertwined with other things I was working on. If I manage to get it into a readable state then I will put it online and tell you about it.

distributed commented 9 years ago

The changes are in branch rxc-monitor.

bsekisser commented 9 years ago

I have pulled, looked at and tested the changes from your branch... made some comments on the code.

On what you have done... looks to be in the right direction (tried something similar that didn't work for me)... One thought is if the functionality could be incorporated into the rxc_raise timer function... another is to add a specific irq pump function... I started to do that but stopped but may look into it again.

Are you getting a double mirror in what you type? as in 'aassddff'? I am thinking that should not be happening and for me took out the line that seemed to be causing it.

On Mon, 2014-12-01 at 09:05 -0800, Michael Meier wrote:

The changes are in branch rxc-monitor.

— Reply to this email directly or view it on GitHub.

bsekisser commented 9 years ago

@buserror Been a while... tried to get your attention... not sure if you are mia, busy or unaware that I tried to ping you... the branch @distributed referenced should be looked over details were brought to light that were not readily obvious.

https://github.com/distributed/simavr/commit/760f115bac2dffcedde3b6a57ad371a0f4b4b12a

distributed commented 9 years ago

Thank you for your comments. I'll try to rework in the following weeks.

buserror commented 9 years ago

Sorry my attention has been taken over about 20 other ways recently, I'm monitoring but I haven't followed the thread in it's entirely. Has a consensus been reached?

bsekisser commented 9 years ago

It seems this is indeed correct... The register descriptions do not adequately bring the issue to light but iirc the blurbs about rxc interrupts cover the issue... it is indeed similar to twi in this instance... ie the software must clear the interrupt and the interrupt must retrigger repeatedly. Finding that, it was time you were brought into the light to know what was going on.

On Mon, 2015-01-19 at 10:42 -0800, Michel Pollet wrote:

Sorry my attention has been taken over about 20 other ways recently, I'm monitoring but I haven't followed the thread in it's entirely. Has a consensus been reached?

— Reply to this email directly or view it on GitHub.

distributed commented 9 years ago

I have reworked the code as per your suggestions.

It is in commit 46b2d8ffbef09d11842e5b7f03b56f526bdb31c1 in my branch rxc-monitor.

Still, the changes only apply to ATMegaX8 devices.

@buserror would you merge this into your master when I modify the changes to hold for all devices?

hovercraft-github commented 7 years ago

Hi, I recently prepared PR #194 with alternative solution. Ideas discussed here was very helpful, especially this part, thanks to @distributed and @bsekisser . However, because of massive changes, I decided not to base upon your code and start directly from the @buserror master. My motivation is that the problem with UART persists, and I realized XON/XOFF pumping is not a main challenge here. It is rather providing proper RXC/TXC/UDRE flags and interrupt handling with realistic time pattern for the user application. If we achieve this, the XON/XOFF problem will be solved in a natural way. For that, my version contains separate RX and TX pumps implemented via timer loops, which sole purpose is to handle interrupt/flag raising. I hope this will allow to close major issues, like e.g. described there. This nice CNC firmware project from @skwee can be used as very strong test-case, one should send entire G-code streams (look in the sim/gcode folder of project) over UART using UniversalGcodeSender or similar software. This test pass now. I would kindly ask @buserror and all the community to inspect my solution and give your expert conclusion. Any criticism is very appreciated. Al