Edzelf / ESP32-Radio

Internet radio based on ESP32, VS1053 and a TFT screen.
GNU General Public License v3.0
990 stars 228 forks source link

ringbuffer on esp32 #109

Open perlix opened 6 years ago

perlix commented 6 years ago

Hallo Ed, [hope you don't consider this out of topic]

My Wemos D1 runs an air traffic control radio that is heavily based on your Esp-radio. For the most part, I just left out what I didn't need and added some extras for my specific use. The main technical difference is that I use Adafruit's VS1053 library, because it has functions for the chip's GPIO pins and for applying patches.

Now, when I run the sketch on esp32, filling the ringbuffer suddenly can't keep up with the VS1053 playing 128 Kbps streams. Feeding the VS1053 directly from the WiFi client works fine, so WiFi is fast enough.

From the ESP-32 Radio changelog, I read that you used a ringbuffer, before switching to a FreeRTOS queue. Did you encounter similar ringbuffer problems, or can you think of any other reason why a sketch based on Esp-radio would run slower on the much faster esp32?

Edzelf commented 6 years ago

You have to sure what is going on. If the ringbuffer is always filled, the CPU is fast enough. If the ringbuffer becomes empty, the CPU is also fast enough, but the sender (or the network) is too slow. In my opinion, the CPU is always fast enough. I do not know about the Adafruit library, but I know that my version of the VS1063 software has a problem if the sender is (only a little bit) slower than the 128 kbps it expects. What you want to to is adapt the bitrate of the VS1063 to the real bitrate of the sender, but despite my efforts I did not find a solution for that.

perlix commented 6 years ago

Hi Ed, Thanks for the quick answer. Guess I'll better try your FreeRTOS alternative then. Still breaking my head over why it runs without problems on esp8266...

Edzelf commented 6 years ago

One difference is that the ESP32 is much slower when you send or receive small buffers to/from WiFi. Maybe that may cause a problem in your set-up. Reading one byte from Wifi takes just as much time as reading 1000 bytes.

koskee commented 6 years ago

Ed, I have a suggestion if you'd be like to hear it...

So.. What if you scrapped the queue based ringbuffer and went back to just declaring a large chunk of memory? You could still use your queue for control messages, perhaps, but just not for the audio data.

I propose you create two tasks. One for filling the ringbuffer, and another for pushing the data back out of it.

The task for emptying the buffer would probably be better if it was synchronized with an interrupt tied to the DREQ PIN via a (binary?) semaphore.

The interrupt gives the semaphore each time the DREQ PIN goes (either high or low, whichever is appropriate), and the filling task simply tries to take the semaphore, causing it to become blocked if the interrupt hasn't given it yet. Give this task a medium-high priority so that it runs as much as possible, and have it block indefinitely while it waits for the ISR to give it. Also be sure to observe the appropriate FreeRTOS API calls when doing something in an ISR (ie. add "FromISR" to the end of the function name).

Also be aware that your interrupt will have to be SUPER short, "bare bones" if you will. Your goal will be to give the semaphore and get out in as little time as possible, for this to function properly. (Also, this might require reworking some of your other ISRs, as they could be a little bit too lengthy in some cases, one would have to do some testing to see)

Then for the filling task, you can simply have it in a mid-low level priority (but above idle and anything that is less essential) so that it runs often, and have it dump data into the ringbuffer as fast as the network will allow, unrestrained. By checking whether it is full before adding to it, you shouldn't have overwrite issues.

Now you may be thinking that this brings up the issue of concurrent access of the ringbuffer by both cores at the same time. While this is true, if you "pin" both filling and emptying tasks to a single core (these could maybe even have their own core), then that will give you mutual exclusion without the overhead of trying to check all the time.

You could also look into performing some sort of priority modification based on how full the buffer is at any point, in order to give each of these tasks more run time if the buffer is too full/empty. Not sure if this would be necessary.

With control messages still coming through your queue, I feel like this might be a better approach to the project, especially if what you say about sending data on a queue is correct. (I hadn't heard this info before, where did you find it?) I think I am inclined to believe it however, since it really could explain a lot of my personal observations..

Hope I explained that adequately, if not feel free to ask me questions or check in the freertos tutorial pdf, as it is pretty well written.

Anyway, this suggestion is just because you seem to have a little more time at your disposal (or maybe I'm just slower writing code? lol) to implement these things, otherwise I'd have attempted it myself. I have another idea regarding the rotary encoder, but since I was already working on it and intended to do a PR when I got it finished, I'll just keep that one to myself for now, unless you really want to hear it..

Have a good one Koskee

Edzelf commented 6 years ago

Koskee, I do not understand what the problem is that you want to resolve. The program is already splitted into a part that fills the queue an a part that is emptying the queue.

koskee commented 6 years ago

I suppose I'm confused. I don't quite understand why there is a datatype attached the buffer, especially if the queue is taking so much extra processor time, as you said above. If we had a separate basic memory buffer (malloc) for audio data, and used a queue exclusively for radio control commands, it would save us attaching a datatype to the audio, as well as allowing us to process commands as soon as they come in (from my understanding of the code, it seems that the control commands are being sent on the data queue? Is this true? rather than throwing them into the ring buffer and waiting for them to come out the other side we might separate them - allowing us to prioritize them).

And in regard to the tasks, I realize you have 2 tasks already, but what I am trying to say is keep chopping it up. The RTOS is an operating system that is well adapted to making decisions on what needs to run first. Think of the concept of a task in much the same fashion as functions (with a few minor distinctions of course). Not everything can be a task, but there's a lot of opportunity to take advantage of the RTOS API that I can see.

So for example, I would divide the "playtask" into a task to fill the buffer & a task to send the buffer then I would divide the main task into a task for managing the executive radio commands, a task for the rotary encoder, a task for the screen output, one for ir, etc... Then you just need to assign an appropriate priority level to each of them in order to take advantage of the rtos scheduler. This also allows you to use the interrupts for the sole purpose of triggering the appropriate task (synchronizing them) so that your ISRs finish immediately and pass control back to the scheduler so that it can continue to allow the most important tasks to run, and perhaps more importantly, to service other interrupts that may come up. In theory this should make the system more responsive to inputs, have a lot more spare CPU cycles overall, and this would likely be with about the same or maybe even less memory usage.

Hope that was a little more clear :)

Edzelf commented 6 years ago

I like the idea of having a separate queue for commands. I will look into that. I never said that a queue is taking much CPU time. I said that reading the WiFi datastream byte for byte is very time consuming.

koskee commented 6 years ago

Ah. That makes more sense.. Have you looked into DMA? Direct memory access? Allows you to bypass the CPU entirely and fills the memory as it comes in without taking any CPU cycles.

Not sure if it's easy to implement, but if I recall it might be only available as an idf component..

Edzelf commented 6 years ago

As I said, it's only a problem when you read the WiFi input stream byte by byte. That's not the case here.

koskee commented 6 years ago

I guess I read the original post wrong, I didn't realize he was talking about a different code base, so I just chimed in with various inefficiencies that I had noticed but didn't have reason to complain about before, in an attempt to give you ideas on where the project could be improved. Oops. Problem solved ☺️