PelionIoT / nanostack-libservice

Other
2 stars 11 forks source link

Add API to set temporary allocation heap limit #70

Closed artokin closed 6 years ago

artokin commented 6 years ago

Add new API to set allocated heap limit that temporary allocation can't exceed. When limit is reached then temporary memory allocation will fail. By default temporary allocation will fail if 95% of the heap is already allocated.

artokin commented 6 years ago

@kjbracey-arm , @SeppoTakalo , @terhei would you please review?

SeppoTakalo commented 6 years ago

Can we get around the limitation of setting the heap info pointer?

kjbracey commented 6 years ago

Two thoughts

Percentage seems right for default, but is it the best runtime API for tweaking? Should they be able to set an absolute number? I guess mixing is more confusing. Maybe provide both percentage-setting and absolute APIs? Or is that overkill.

Not a fan of expressing the number as "95%", rather than "5%". The former easily ends up being misread or mistranslated as "allow temporary allocations to use 95% of the heap", which isn't it.

If you look at it the other way, "temporary allocations must leave 5% free" seems clearer.

daniel-cesarini-tridonic-com commented 6 years ago

What kind of tests have been performed with the code now in place with a changing set of input parameters and system configurations?

FYI: @karsev @MarceloSalazar

artokin commented 6 years ago

Unit tests are used to verify functionality. Feature was manually tested in HW by adjusting heap amount and heap limit so that temporary memory allocation failed (failures can be seen from border router traces). Most of the temporary allocations were created by incoming data packets in ethernet/RF driver. System operated normally after temporary allocation failure.

daniel-cesarini-tridonic-com commented 6 years ago

Thank you @artokin !

Question: are ALL temporary memory allocations "resistant" to failures?

artokin commented 6 years ago

Yes they are. Stack is able to recover from low memory situation(s) and then continue normal operation.

daniel-cesarini-tridonic-com commented 6 years ago

Thank you again!

From your experience, what should be the behavior of the Thread network (composed of Border Router and End-Nodes) when the BR (or another component in the network) would drop packets or have other failures related to the temporary allocations?

We are seeing strange behavior when our BR is failing memory (we still need to check whether it is the temporary or the permanent allocations failing), and are thus looking forward for having FULL ASSURANCE that failures on temporary are perfectly acceptable by the Thread stack and the Thread network.

Coming from past discussions with @kjbracey-arm and @SeppoTakalo it looks crucial to me to understand whether it is acceptable / enough:

Thank you.

daniel-cesarini-tridonic-com commented 6 years ago

FYI: @karsev @MarceloSalazar

MarceloSalazar commented 6 years ago

Internal reference: IOTTHD-2779

MarceloSalazar commented 6 years ago

@daniel-cesarini-tridonic-com based on conversations with our team:

From your experience, what should be the behavior of the Thread network (composed of Border Router and End-Nodes) when the BR (or another component in the network) would drop packets or have other failures related to the temporary allocations?

The whole design of the Internet is based on the fact that any router is allowed to drop the traffic when it cannot route it.

Nanostack or mesh networks are not exceptions to that rule. But it is more visible in the lossy networks like mesh. The design goal of Nanostack has been that it routes all traffic that it can allocate memory for. So out of memory situation should be normal and a tolerated situation.

We have done Fuzz testing against Nanostack and found the event loop might run out of memory - we have fixes implemented already. We can't guarantee these are the last issues but we continue our testing efforts against Nanostack and fix any problems we identify.

daniel-cesarini-tridonic-com commented 6 years ago

@MarceloSalazar I agree with you, the Internet and in general "best-effort" and "lossy networks" share well known working paradigms.

Nonetheless, as you already stated, the single components and the network as a whole should be able to survive any traffic injected, especially legal traffic.

Regarding Fuzz testing and other sorts of testing of course the higher the stability, the better.

daniel-cesarini-tridonic-com commented 5 years ago

ping @MarceloSalazar