Closed artokin closed 6 years ago
@kjbracey-arm , @SeppoTakalo , @terhei would you please review?
Can we get around the limitation of setting the heap info pointer?
Two thoughts
Percentage seems right for default, but is it the best runtime API for tweaking? Should they be able to set an absolute number? I guess mixing is more confusing. Maybe provide both percentage-setting and absolute APIs? Or is that overkill.
Not a fan of expressing the number as "95%", rather than "5%". The former easily ends up being misread or mistranslated as "allow temporary allocations to use 95% of the heap", which isn't it.
If you look at it the other way, "temporary allocations must leave 5% free" seems clearer.
What kind of tests have been performed with the code now in place with a changing set of input parameters and system configurations?
FYI: @karsev @MarceloSalazar
Unit tests are used to verify functionality. Feature was manually tested in HW by adjusting heap amount and heap limit so that temporary memory allocation failed (failures can be seen from border router traces). Most of the temporary allocations were created by incoming data packets in ethernet/RF driver. System operated normally after temporary allocation failure.
Thank you @artokin !
Question: are ALL temporary memory allocations "resistant" to failures?
Yes they are. Stack is able to recover from low memory situation(s) and then continue normal operation.
Thank you again!
From your experience, what should be the behavior of the Thread network (composed of Border Router and End-Nodes) when the BR (or another component in the network) would drop packets or have other failures related to the temporary allocations?
We are seeing strange behavior when our BR is failing memory (we still need to check whether it is the temporary or the permanent allocations failing), and are thus looking forward for having FULL ASSURANCE that failures on temporary are perfectly acceptable by the Thread stack and the Thread network.
Coming from past discussions with @kjbracey-arm and @SeppoTakalo it looks crucial to me to understand whether it is acceptable / enough:
Thank you.
FYI: @karsev @MarceloSalazar
Internal reference: IOTTHD-2779
@daniel-cesarini-tridonic-com based on conversations with our team:
From your experience, what should be the behavior of the Thread network (composed of Border Router and End-Nodes) when the BR (or another component in the network) would drop packets or have other failures related to the temporary allocations?
The whole design of the Internet is based on the fact that any router is allowed to drop the traffic when it cannot route it.
Nanostack or mesh networks are not exceptions to that rule. But it is more visible in the lossy networks like mesh. The design goal of Nanostack has been that it routes all traffic that it can allocate memory for. So out of memory situation should be normal and a tolerated situation.
We have done Fuzz testing against Nanostack and found the event loop might run out of memory - we have fixes implemented already. We can't guarantee these are the last issues but we continue our testing efforts against Nanostack and fix any problems we identify.
@MarceloSalazar I agree with you, the Internet and in general "best-effort" and "lossy networks" share well known working paradigms.
Nonetheless, as you already stated, the single components and the network as a whole should be able to survive any traffic injected, especially legal traffic.
Regarding Fuzz testing and other sorts of testing of course the higher the stability, the better.
ping @MarceloSalazar
Add new API to set allocated heap limit that temporary allocation can't exceed. When limit is reached then temporary memory allocation will fail. By default temporary allocation will fail if 95% of the heap is already allocated.