ARM-software / CMSIS_5

CMSIS Version 5 Development Repository
http://arm-software.github.io/CMSIS_5/index.html
Apache License 2.0
1.33k stars 1.08k forks source link

Message queue documentation lacks information for static memory allocation #1063

Open IngmarPaetzold opened 3 years ago

IngmarPaetzold commented 3 years ago

Hi there,

I tried to create a message queue with osMessageQueueNew(), providing the data storage as static memory. However, it returns NULL unless I add 12 x msg_count to field .mq_size.

I followed this documentation: https://www.keil.com/pack/doc/CMSIS/RTOS2/html/group__CMSIS__RTOS__Message.html#structosMessageQueueAttr__t (CMSIS-RTOS2, Version 2.1.3)

Here, it only states:

"The minimum memory block size is msg_count * msg_size (parameters of the osMessageQueueNew function). The msg_size is rounded up to a double even number to ensure 32-bit alignment of the memory blocks."

This is not correct, there is more memory needed.

Later, I found in the section about static memory https://www.keil.com/pack/doc/CMSIS/RTOS2/html/theory_of_operation.html#StaticObjectMemory , where macro osRtxMessageQueueMemSize() is mentioned.

That does the same as my 12 x count adding workaround. However, I strongly recommend to give a clear hint to this macro / the additionally needed memory on the queue reference page itself, since this is the first reference users consult.

But a question arises: Following the track why this broke, there is this line 253 in the implementation of osMessageQueueNew() in file rtx_msgqueue.c (CMSIS-API 2.1.3, Keil RTX 5.5.1):

block_size = ((msg_size + 3U) & ~3UL) + sizeof(os_message_t);

This sizeof yields value 12 and is later multiplied with count.

So why is this extra memory needed in every message? On a first glimpse, I could not find its usage. And it means that when I want a message queue of 100 characters, it blows up the memory requirement - including 32-bit alignment and extra 12 bytes per element - to 1,600 bytes!

regards, Ingmar

JonatanAntoni commented 3 years ago

Hi @IngmarPaetzold,

The overhead is caused by the message queue handling in RTX5. Each message needs to have 12 bytes of control data, such as priority and linked-list pointers.

I guess the irritation comes from the documentation being implementation agnostic. The CMSIS-RTOS2 API does not mandate a specific implementation. Hence the documentation can only state the lower bound for memory requirements. The actual memory requirements are implementation specific. The 12 bytes message header overhead you are seeing is specific to RTX5.

Cheers, Jonatan

IngmarPaetzold commented 3 years ago

Thanks Jonatan for this very quick explanation.

Yes, when implemented as a linked list, it makes sense to have this additional data. I had a ring-buffer implementation in mind, where elements lay side-by-side, but then, obviously, priority would be an issue.

Concerning the minimum size and RTX specific stuff: seems reasonable, although as a relative RTOS newbie, it costed a few hours to track down the reason why it failed. OK, there is a small note that one should look on the static memory section, but even there, is is not clear that more memory is needed than count x element size, and the example below shows only static Thread memory assignment, which is not using one of the memory calculation macros but a magic number instead. I just want to point out that using the macros osRtxMemoryPoolMemSize() and osRtxMessageQueueMemSize() is actually kind of mandatory when one wants to do it right, and that is not really apparent.

Whatever, now I know what's going on, and the code works fine!

have a nice day!

cheers, Ingmar

JonatanAntoni commented 3 years ago

Hi @IngmarPaetzold,

Yes, I already presumed you might want to use a circular ring buffer as a character-queue. I admit the message queue (as implemented in RTX5) is not a good match, here.

Have you considered using your own buffer combined with event flags or a counting semaphore for thread synchronisation?

May I ask you to collaborate on the documentation? If the current doc does not contain the information you were looking for. Would you be able to propose a change? E.g. by raising a pull-request?

Cheers, Jonatan

IngmarPaetzold commented 3 years ago

Hi Jonatan, Yes, sure, direct contribution is even better. Not today, but yes. At least for a good proposal that would have saved some time.

For one of my intended queue applications, I actually have my own buffer (non thread safe) and was about to port it so that it only uses the OS for communication. A queue was the first choice because that is exactly the semantic I need: writing to the one side (non-blocking), and waiting on the other (blocking, i.e. without wasting processor time while waiting). No large experience with thread flags yet, but a few with mutexes. Now, with the new insight (that queue is not the best way to go), I will probably switch to shared memory (from a memory pool, right?) and synchronization.

Cheers, Ingmar

JonatanAntoni commented 3 years ago

Hi @IngmarPaetzold,

yes, I'd go for shared memory and lightwight sync as well.

Please be aware that a memory pool has some runtime overhead as well, e.g. 4-byte alignment causing a minimal block size as well.

So for a plain character stream you might be much more efficient with using a pre-allocated circular buffer of required size. As long as you have strict FIFO-semantics with only one sender and one receiver you do not need advanced thread-sync techniques. Using a counting semaphore (counter reflecting number of chars waiting in the buffer) should be a performant and resource-saving solution. The sender calls release and the receiver blocks on acquire.

Cheers, Jonatan