Memory Management in Elektra

@haraldg wrote in #1840:

AFAIK memory allocation doesn't fail on any operating systems nowadays: You sooner get a signal than an invalid pointer. In my own code I either treat it as non-failing or (if following the standards to the letter is important) crash the application on failure. I think any API that carries error conditions around only because of memory allocation, is needlessly complex.

On one hand I think that the core question (what should/could user code do, when a malloc of a harmless data structure fails?) is quite general, on the other hand there are some corner cases (like freeing resources, that won't be reclaimed by the operating systems, when the application crashes) that need special consideration.

Of course this is just the technical side. I can see that it might politically impractical to convince all potential users of libelektra, that sometimes crashing their application is the right thing.

I just realized that I did a bit of a poor job at explaining the rationale behind my previous comments because I failed to mention something that's probably not common wisdom: To have an API support reliable programming on the level of "the system might run out of memory" the user code needs to allocate/provide the memory for all required data structures. - Then the user code can mlock() the memory region to ensure, that it always stays in physical ram.

Letting the user allocate memory for us conflicts with having ABI compatibility and still being able to extend and optimize our data structures (which we actually do, for example, @KurtMi implements a hash map for the KeySet).

In Elektra all allocations should happen via elektraMalloc, elektraCalloc and elektraRealloc. So users can provide their own elektraMalloc, elektraCalloc and elektraRealloc which do mlock if required (or even implement customized memory management).

@mpranj currently is working on a storage plugin and persistent cache which allows Elektra's data structures to be mmaped. The first results are that this works quite well and is very fast (only some pointers need to be patched). Afaik mmap would not work on memory already allocated by the user.

I can see that it might politically impractical to convince all potential users of libelektra, that sometimes crashing their application is the right thing.

Everyone can do so by setting ELEKTRA_DEBUG to ON. So we push the decision if Elektra should crash on failing assumptions (like memory allocation failed) to the maintainers/distributors.

It is very uncommon that libraries crash, even on clear misuse of the API. For example, glibc just prints that a double free was detected instead of aborting.

Sorry, if I caused a bit of a stir. I didn't want to criticize elektra memory managment and least of all did I want to advocate the use of mlock() - I never actually saw that used in any code that I came across, this kind of high reliability programming is a fairly esoteric topic.

I only wanted to point out that the elektra API already conflicts with certain ways of high reliability programming (not a bad thing at all - when designing an API such decisions have to be made, and given libelektras target audience, it makes perfectly sense for things to be as they are) and thus doesn't make much sense to consider high reliability concerns when pondering future decisions.

So users can provide their own elektraMalloc, elektraCalloc and elektraRealloc which do mlock if required

I don't think this would work: mlock() works on the level of memory pages and malloc() obviously doesn't. I don't think mlock() does reference counting, so on the first free() you would munlock() the whole page, I guess.

It is very uncommon that libraries crash,

A crash need not be aborting, there could be a callback: In C applications this probably would just terminate without leaving a mess behind, and actually in more complicated cases instead of crashing this could start some kind of garbage collector and return when more memory is available. When using libelektra from something like haskell or python that might be reasonable, though I have no idea how this would work in practise.

I think the question is: What should user code do if malloc() fails and how can libelektra help with it? In a way this is a microoptimization and I can see everbody having more important things to do, but otoh in my experience trying to shrug of such decisions leads to compicated and annoying to use APIs in the long run, so I thought I should mention the topic anyway. I'm just an amateur programmer without formal education, so I'm probably not the best person to have this discussion with, but maybe if a talented student explores and compares the different possible approaches, this could be a nice thesis subject?

Sorry, if I caused a bit of a stir. I didn't want to criticize elektra memory managment

Don't worry, there is nothing wrong to criticize it.

API already conflicts with certain ways of high reliability programming

Elektra's API does not really say that any allocation on the heap will ever happen. The functions could always return pointers from one memory block. And to some extend, elektraMalloc and friends could be used to realize this. And with mmap we are also going in this direction. Of course memory management is not the only point to consider for high-reliability and/or real-time programming.

What should user code do if malloc() fails and how can libelektra help with it?

If malloc() fails internally, Elektra is hopefully able to return an error. It is hardly tested, though. If malloc() fails in the application, Elektra cannot really help. Most Elektra calls would fail, too. (Unless elektraMalloc was rewritten to use an internal memory pool.)

I'm just an amateur programmer without formal education, so I'm probably not the best person to have this discussion with, but maybe if a talented student explores and compares the different possible approaches, this could be a nice thesis subject?

C is not mem-safe at all and malloc not being successful is the least of its problems. The two solutions modern programming languages have are ownership (e.g. RAII in C++, borrowing/lifetimes in Rust) or garbage collection (most modern languages). Both cannot really be realized in C, C lacks semantics for doing that properly. So the conclusion of such a work most likely would be not to use C :smile:

markus2330 writes:

Elektra's API does not really say that any allocation on the heap will ever happen.

But it allows it. Well, I guess your point is, that the API guarantees that no malloc() is called if elektraMalloc() points somewhere else. But is this really helpful? I hopefully will never write my own init, but if I put myself in those shoes for a moment, then I think I'd rather write my own configuration parser and miss 3-way-merge instead of writing my own memory managment functions. (Unless the API (the parts that I use) also guarantees that memory is never freed and no real management is necessary ... but I kind of doubt this is feasable.)

What should user code do if malloc() fails and how can libelektra help with it?

If malloc() fails internally, Elektra is hopefully able to return an error.

That doesn't answer the question, what user code should do in this case, at all.

Also "return an error" is a bit too unspecific to be helpful: Is it always possible to tell from the return value that memory allocation was the problem instead of some other syscall? Does errno contain valid information? (Probably not, as I see libelektra prints warnings itself, which might corrupt errno unless special care is taken.) Does the API specify for all functions, which can only fail when malloc fails, that this is the case - so that user code has a guarantee that it can ignore the return value if it doesn't care about malloc() related errors?

While returning an error is necessary for many operations, it also causes a lot of annoying issues for everybody and especially the people using the API. (And often there are too few error codes to properly describe what went wrong and how to fix it or the documentation lacks the details.)

C is not mem-safe at all and malloc not being successful is the least of its problems. The two solutions modern programming languages have are ownership (e.g. RAII in C++, borrowing/lifetimes in Rust) or garbage collection (most modern languages). Both cannot really be realized in C, C lacks semantics for doing that properly. So the conclusion of such a work most likely would be not to use C :smile:

This is not quite what I had in mind. I was more thinking about research questions like: How likely are real world crashes because of treating malloc(sizeof(foo)) as always successful? (Terminating from elektraMalloc() is equivalent to a crash because of the null pointer dereference for this purpose.) How many lines of code could be saved inside elektra? How does that relate to memory footprint and thus likelyhood of being choosen by the OOM killer, when memory runs out? (If I'm not mistaken OOM killer usually picks the process with the biggest memory footprint, though probably some more complicated criteria is used.) How much head ache (translate: hours reading documentation) and lines of code can be avoided for somebody using elektra. Are there use cases or elektra functions where after an error is returend because of malloc() the user code want's to do something else then terminate? Could these use cases also be catered for by a callback instead of an error code and how do these apporaches compare? - Seems like a nice combination of theory work, code study and usability tests... :)

But it allows it. Well, I guess your point is, that the API guarantees that no malloc() is called if elektraMalloc() points somewhere else.

My point is that the API is agnostic to whatever kind of memory management happens in the implementation.

But is this really helpful? I hopefully will never write my own init, but if I put myself in those shoes for a moment, then I think I'd rather write my own configuration parser and miss 3-way-merge instead of writing my own memory managment functions.

Obviously, there are already many memory management libraries available. But nevertheless people will invent new ones, together with new configuration file parsers 😉

Also "return an error" is a bit too unspecific to be helpful

Yes, it is unspecific but it is usually not much of a problem. The key and ks functions only fail because of one of two reasons:

you passed wrong arguments (e.g. NULL instead of a pointer) or used them wrongly
(re)malloc failed

So if you used the API correctly you know that malloc failed. But most likely you still passed wrong arguments because of a situation you did not think of 😉

For the kdb* functions you get error number 87 on (re)mallocs failures (if the error routines worked in that situation and the OOM killer was not faster).

While returning an error is necessary for many operations, it also causes a lot of annoying issues for everybody and especially the people using the API.

Most likely everyone will only check for error number 30 (a write-conflict with another application occurred) because here a 3-way merge will help. All other errors need to be escalated to the user anyway. Even if some errors might be temporary (like hard disk full) they are most likely permanent as long as the user does nothing.

But generally I agree that the concept of "infallible memory allocation" might be worthwhile pursuing. But Elektra is not the point where this can (or should) be done, it needs to be solved in a memory management library (which can be used by Elektra via elektraMalloc) or even better in libc. (There are many such annoyances, like return codes for close calls.)

How likely are real world crashes because of treating malloc(sizeof(foo)) as always successful?

I find these questions really interesting but:

It will hardly be a real world problem unless someone provides us a real world system where this occurs (like our build server from time to time 😄) But what are the chances that in a reproducible real-world situation memory allocation problems happen within Elektra? Config settings usually fit in some kilo bytes.
Is this really an obstacle for the adoption of Elektra?

markus2330 writes:

But it allows it. Well, I guess your point is, that the API guarantees that no malloc() is called if elektraMalloc() points somewhere else.

My point is that the API is agnostic to whatever kind of memory management happens in the implementation.

So in the end there are no guarantees at all?

Yes, it is unspecific but it is usually not much of a problem. The key and ks functions only fail because of one of two reasons:

you passed wrong arguments (e.g. NULL instead of a pointer) or used them wrongly

(re)malloc failed

Is this API guaranteed or implementation detail?

While returning an error is necessary for many operations, it also causes a lot of annoying issues for everybody and especially the people using the API.

Most likely everyone will only check for error number 30 (a write-conflict with another application occurred) because here a 3-way merge will help. All other errors need to be escalated to the user anyway. Even if some errors might be temporary (like hard disk full) they are most likely permanent as long as the user does nothing.

I'm biased here, but in my case there is no user to escalate things to. My applications need to function properly as long as reasonably possible, even if minor issues (like hard disk full or remounted read-only) occur. If that's not possible anymore, they should crash without bringing the rest of the system in an unstable state.

I don't know what programming workflow other people have, but for me it is pretty much:

Lookup every function that I use in the documentation
Go through all possible error codes
Decide for each one how to deal with it (handle, crash, ignore)
Knowing about side-effects is also very useful

Each infallible function makes my work a lot easier. malloc() related errors are treated (in most cases) as infallible and ignored, but if the API says a function might fail without specifiying the possible causes of errors then this is very annoying: What am I supposed to do? Even if I look up the implementation and it is only a malloc() there, then there is no guarantee that in the next version there might be a second syscall, that is less infallible then malloc() ...

But generally I agree that the concept of "infallible memory allocation" might be worthwhile pursuing. But Elektra is not the point where this can (or should) be done, it needs to be solved in a memory management library (which can be used by Elektra via elektraMalloc) or even better in libc. (There are many such annoyances, like return codes for close calls.)

At a time I was a bit surprised that no infallible memory allocation API is provided by modern libc's, but then the only new API I could come up with was the callback if malloc fails and this is easily implemented as a wrapper and I never had a need for that anyway. So since malloc is already essentially infallible (aside from some esoteric corner cases) there is no need for a new function. It is up to the call sites to decide if they treat it as infallible or fall under one of the corner cases where it actually is not.

But what are the chances that in a reproducible real-world situation memory allocation problems happen within Elektra? Config settings usually fit in some kilo bytes.

Which is exactly why is suggested treating it as infallible in the first place ... :)

Is this really an obstacle for the adoption of Elektra?

This is very difficult to answer.

For your target audience I would guess the main obstacles are: 1) Almost nobody else is using it 2) No high level API is provided yet 3) It is a bit scary with all the many features and few people look into it in enough detail to realize it is modular enough to not cause too many problems. 4) Many programmers like to think of configuration issues (like merging on upgrades) as the maintainers problem and don't care as much about 3-way-merge ability as they IMO should. 5) Many people think of configuration as some kind of script rather then a set of key-value-pairs, so elektra seems alien to them.

I guess making the API simpler to use with more guarantees (even if they are theoretically not correct) might help, but not much. Though it is somewhat related to (2) - if the upcoming high level API is enough for most people and simple to use, then any annoyance in the existing API is much less relevant.

For me personally I have already decided to use libelektra eventually, so it is clearly not an obstacle. However I do try to follow the "keep it simple and stupid" paradigm and I feel elektra suffers a bit from its "make everything possible and dump decisions onto the users" approach.

So in the end there are no guarantees at all?

The guarantees are in the opposite direction: Elektra guarantees not to crash, not to send signals, not to memleak, not to print something, or more generally: not to have any (global) side effects.

The key and ks functions only fail because of one of two reasons: Is this API guaranteed or implementation detail?

The API docu should describe it, so it is an API guarantee. When I had more time, I went through all API calls and checked for things like that. But we never checked if all out-of-memory situations are correctly handled as I considered it as a minor threat.

I moved discussions of:

error handling to #1879
- hindering factors to #1880

ElektraInitiative / libelektra

Memory Management in Elektra #1873