keyNew overhaul - Githubissues

markus2330 commented 4 years ago

If we decide to keep vaargs for keyNew, we should consider adding something like KEY_ADD_BASE_NAME for constructions of special names without the need to know about the escaping rules, e.g.:

keyNew("system:/", KEY_ADD_BASE_NAME, "elektra", KEY_ADD_BASE_NAME, "mountpoints", KEY_END) for system:/elektra/mountpoints.

kodebach commented 3 years ago

Note: I edited the top command, since user:/% is not a valid escaped name.

I also thought about and experimented with some alternative APIs:

Only allow setting key name in keyNew

Key * keyNew (elektraNamespace ns, char * parts, ...);

// system:/
keyNew (KEY_NS_SYSTEM, KEY_END);

// system:/elektra/mountpoints
keyNew (KEY_NS_SYSTEM, "elektra", "mountpoints", KEY_END);

// system:/elektra/mountpoints/%
keyNew (KEY_NS_SYSTEM, "elektra", "mountpoints", "", KEY_END);

Pro:

Easy to use
Hard to misuse

Con:

Hard to transition, KEY_VALUE, KEY_META, etc. have to translated into explicit function calls.
Makes it almost impossible to do ksNew(10, keyNew (...), keyNew (...), ..., KS_END) directly.

Use unescaped name and size directly:

Key * keyNew (elektraNamespace ns, char * parts, size_t partsSize);

#define keyNewLiteral (ns, parts) keyNew ((ns), (parts), (sizeof(parts)));

// system:/
keyNew (KEY_NS_SYSTEM, NULL, 0);
keyNew (KEY_NS_SYSTEM, "", 1);
keyNewLiteral (KEY_NS_SYSTEM, "");

// system:/elektra/mountpoints
keyNew (KEY_NS_SYSTEM, "elektra\0mountpoints", 20);
keyNewLiteral (KEY_NS_SYSTEM, "elektra\0mountpoints");

// system:/elektra/mountpoints/%
keyNew (KEY_NS_SYSTEM, "elektra\0mountpoints\0", 21);
keyNewLiteral (KEY_NS_SYSTEM, "elektra\0mountpoints\0");

Pro:

Could be extended to still allow KEY_VALUE, KEY_META, etc.
Easy to use
keyNew is more or less a single memcpy

Con:

Easy to misuse, parts and partsSize must match, macro only works with string literals.
Macros are always problematic for public APIs

More complicated API:
```
Key * keyFromParts (elektraNamespace ns, const char ** parts);

#define elektraKey(ns, ...) keyFromParts ((ns), ((const char *[]){ __VA_ARGS__, NULL }))

KeySet * ksFromArray (size_t alloc, Key ** keys);

#define elektraKeySet(alloc, ...) ksFromArray ((alloc), ((Key *[]){ __VA_ARGS__, NULL }))
```
See here for example usage (ignore the obvious memory leaks).

Pro:
- ksFromArray allows using a single memcpy + qsort, which should be faster than calling ksAppendKey repeatedly (essentially insertion sort)
- No more var-args in functions (only in macros)
- Proper compiler warning, if wrong type in var-args
Con:
- uses macros in public API
- KEY_VALUE and KEY_META are still difficult to implement via macros, for easy calling of elektraKey in elektraKeySet
Variant of 3
```
typedef Key MetaKey;
Key * keyCreate (elektraNamespace ns, const char ** parts, const char * value, MetaKey ** metaKeys);

static inline MetaKey * keyCreateMeta (const char ** parts, const char * value) {
  return keyCreate(KEY_NS_META, parts, value, NULL);
}

KeySet * ksFromArray (size_t alloc, Key ** keys);

#define elektraArray(type, ...) ((type[]){ __VA_ARGS__, NULL })
```
See here for example usage (ignore the obvious memory leaks).

Pro:
- ksFromArray allows using a single memcpy + qsort, which should be faster than calling ksAppendKey repeatedly (essentially insertion sort)
- No more var-args in functions (only one macros)
- Proper compiler warning, if wrong type in var-args
- Support for key value and meta-keys
- Explicit API showing that a Key consists of namespace, parts, value and metaKeys and a KeySet is a container of keys.
Con:
- uses macros in public API (but only one)
- a lot more complicated to use than 3, keyCreateMeta helps but it's still very clunky
Note: elektraArray could be put into an internal header and we recommend to create a separate macro for external code, if e.g. keyCreate (KEY_NS_SYSTEM, (const char *[]){ "elektra", NULL }, NULL, NULL) is to verbose for the external user.

I don't have strong preference for any of these APIs. My only wish is that keyNew doesn't use escaped names. The validation, canonicalization and unescaping process is quite expensive and replacing to with a few dumb memcpy would likely improve performance. Of course, I don't have benchmarks to prove this with any significance, but I think in this case we can go on gut-feeling (complicated analysis vs. memcpy seems quite obvious).

In the ideal case, we could even move the escaped name stuff into separate library.

markus2330 commented 3 years ago

Yes, I agree it would be great if the escaped name stuff could be kicked out of the core. I am not sure if it is realistic, though. Aren't there too many clients needing some way to give more than one name part via one char*? E.g. what about keyAddName?

Only allow setting key name in keyNew

I like the idea best from the options but why limit it to setting key name? Everything else could be simply moved to the beginning:

keyNew (KEY_VALUE, "5", KEY_META, "8", KEY_NS_SYSTEM, "elektra", "mountpoints", KEY_END);

kodebach commented 3 years ago

what about keyAddName?

keyAddName (possibly renamed) could be moved to the new library. I think for most clients (that don't involve user interaction) keyAddBaseName should be good enough.

The exception of course are non-canonical names e.g. keyAddName (k, "..") or keyAddName (k, "#10"):

Array parts can easily be added via keyAddBaseName and elektraWriteArrayNumber or we could also add a special keyAddArrayPart that takes an integer.
A single keyAddName (k, "..") is much better handled by keySetBaseName (k, NULL)
keyAddName (k, "../..") (or more ..) can be handled by multiple keySetBaseName or we could add a new keyRemoveParts (k, n) that removes n parts from k.

The only reason I can think of that the escaped name would be needed, is user interactions (printing, user input, etc.). In that case linking against a second library should be acceptable.

Everything else could be simply moved to the beginning

I find this version quite hard to read. If there are a lot of meta keys, the key name will be very far down the invocation of keyNew. On top of that (especially with our clang-format settings) it might be hard to spot where the key name starts, if it spans across a few lines.

Take for example this invocation from examples/opts.c:

keyNew (SPEC_BASE_KEY "/emptydirs", KEY_META, "description", "remove empty directories", KEY_META, "opt", "d", KEY_META, "opt/arg", "none", KEY_META, "opt/long", "dir", KEY_END)

That would translate to:

keyNew (KEY_META, "description", "remove empty directories", KEY_META, "opt", "d", KEY_META, "opt/arg", "none", KEY_META, "opt/long", "dir", KEY_NAME, KEY_NS_SPEC, "sw", "org", "erm", "#0", "current", "emptydirs",  KEY_END)

(Note: we need a separate KEY_NAME option otherwise the KEY_NS_* and KEY_* values could collide).

As a second wish I would add: I'd prefer to not have a var-args based API. It is not only quite error-prone to use, but it is also pretty clunky to implement and maintain. I also don't know what implications var-args have on ABI compatibility.

markus2330 commented 3 years ago

keyAddName (possibly renamed) could be moved to the new library. I think for most clients (that don't involve user interaction) keyAddBaseName should be good enough.

For hierarchical storage plugins, I agree. But I am afraid at the moment nearly every other client needs some way to construct a key name from a single string (e.g. we saw such a code recently in qt-gui). I do not see a realistic way how to rewrite all of this. Do you have any idea about how to do the migration?

The only reason I can think of that the escaped name would be needed, is user interactions (printing, user input, etc.). In that case linking against a second library should be acceptable.

I agree that in these cases linking is no problem. I am not so optimistic, that these are the only cases though. E.g.:

in some storage plugins where "/" should be interpreted as hierarchy, we simply pass the name using keyAddName.
src/libs/opts/opts.c also uses name manipulation via keyAddName
even the high-level API uses keyAddName to construct new key names using always the same lookup key

This is only a small set of examples in a quick look-through. It looks like extensive work to fix all of this. And in most cases, the library would be needed. And we already have quite many libraries, one more that would need to get linked virtually anywhere seems to be a bit pointless.

Especially, also all the bindings need it, as they also all provide a way to create a key name from a single string. E.g. the cpp binding up-to-now by definition only exposed core functionality.

I find this version quite hard to read

I agree.

As a second wish I would add: I'd prefer to not have a var-args based API. It is not only quite error-prone to use, but it is also pretty clunky to implement and maintain. I also don't know what implications var-args have on ABI compatibility.

Yes, also bindings cannot interact with the vaargs interface (except of invocations with pre-defined list of parameters). This was a big problem in the beginning but a good solution was found with keyVNew and KEY_FLAGS.

So at the moment we are quite at a sweet spot with vaargs that both:

allows to create arbitrary KeySets from a single constructor
allows support for bindings doing the same

So going away from the sweet spot only makes sense if:

we can avoid vaargs totally
we have a good plan how to actually do the labor-intensive migration

kodebach commented 3 years ago

we saw such a code recently in qt-gui

The Qt-GUI is user-interaction, but yes I see that linking another library just for keyAddName might not be a good solution. However, we could still move anything that uses escaped names out of kdb.h (and explicitly not include any headers that use escaped names in kdb.h) to encourage people to avoid escaped names.

in some storage plugins where "/" should be interpreted as hierarchy, we simply pass the name using keyAddName.

In those cases it would actually be better to do what quickdump does and copy into a common buffer and then call keyNew.

even the high-level API uses keyAddName to construct new key names using always the same lookup key

That is a good point, although at least the generated version of the high-level API should be moved to unescaped names, once the remove the escaped name from struct _Key.

So at the moment we are quite at a sweet spot with vaargs that both:

I think the version with arrays would work much better for bindings.

we have a good plan how to actually do the labor-intensive migration

I don't think there will ever be a good plan for migrating such a core API as keyNew. The only solution is another huge PR like #3555. The task would be a bit simpler, since finding all invocations of keyNew is a lot simpler than finding all uses of key names (just rename keyNew to keyNew2 in kdb.h and see what doesn't compile/link).

But since we are just dealing with basic function invocations, it should theoretically be possible to write an upgrade script. This series of articles seems to describe how it could be done with clang tooling. In fact just by using clang-query it's pretty easy to find all invocations of keyNew.

markus2330 commented 3 years ago

In those cases it would actually be better to do what quickdump does and copy into a common buffer and then call keyNew.

Why would that be better? If you use keyAddName you do not have any hassles with memory management.

I think the version with arrays would work much better for bindings.

I do not see how this could improve the current bindings? And there is already a lot of knowledge how to make bindings for the API as-is, I do not see a need for improvement.

I don't think there will ever be a good plan for migrating such a core API as keyNew

About the initial proposal: The name KEY_BASE_NAME, might better describe what it does and is more consistent with keyAddBaseName. That it will get added is obvious. But actually adding this feature is very low priority, as it only makes keyNew, that is already a feature-monster, even more bloated.

I like the plan to first fix the leftovers of the keyname overhaul, release that and then get rid of stuff (like key in _Key) without adding even more stuff.

kodebach commented 3 years ago

Why would that be better? If you use keyAddName you do not have any hassles with memory management.

It's C, you always have to deal with memory management. But to give a bit more context:

You have to read the file contents into some part of memory.
If you read it directly into a buffer that is prefixed with the parent key name, you don't have to do keyDup (parent); keyAddName; keyDel or similar sequences. You can instead to a single keyNew.
Re. memory management: Because of the nature of escaped key names, you either have to
- store the key name size in the file: In which case resizing the buffer is pretty simple.
- use a terminator for the end of the key name: In this case you have to do a lot of buffer copying anyway, because you don't know how many bytes you need to read in advance. A bit more effort to resize the common buffer won't matter.

I do not see how this could improve the current bindings?

It might not improve current bindings, since they are already written. But as you said, bindings for varargs functions are hard to write (sometimes even impossible, e.g. the go bindings needs custom C wrappers). Calling a function that takes an array, will always be possible.

I do not see a need for improvement

There may not be a need, but it would be positive side effect.

I also don't see any positives in using var-args, except shorter code. But shorter code at the expense of type safety and compile time errors, is not something I'm a fan of.

The name KEY_BASE_NAME, might better describe what it does and is more consistent with keyAddBaseName

Could you add some examples for adding multiple key name parts. e.g. what's the replacement for keyNew ("system:/elektra/mountpoints", KEY_END)?

The name KEY_BASE_NAME would only make sense to me, if you have to use it before every key name part. So it would be:

keyNew ("system:/", KEY_BASE_NAME, "elektra", KEY_BASE_NAME, "mountpoints", KEY_END);

That it will get added is obvious.

To me and you it might be, but I wouldn't say that's the case in general. It could just as well, be the case that the above like results in system:/mountpoints, because the second KEY_BASE_NAME overrides the first (i.e. corresponds to keySetBaseName).

I like the plan to first fix the leftovers of the keyname overhaul, release that and then get rid of stuff (like key in _Key) without adding even more stuff.

Yes, of course. This issue should be solved pre-1.0, but definitely not for the next release.

markus2330 commented 3 years ago

It's C, you always have to deal with memory management.

I mean something like:

https://github.com/ElektraInitiative/libelektra/blob/ecd4c997d90ad595ca36ba709da2df07e6f50282/src/plugins/ni/ni.c#L54

I also don't see any positives in using var-args, except shorter code.

Yeah, the idea was to have the possibility of (generated) C-Code which you can include and assign to a KeySet variable easily. I don't know if it was really worth all the hassle but now it is done and works (see also c plugin). I am not sure if it is a selling point, but it has some elegance.

Could you add some examples for adding multiple key name parts

Yes, I changed it in the top post.

The name KEY_BASE_NAME would only make sense to me, if you have to use it before every key name part.

Yes, exactly, this was the idea.

To me and you it might be, but I wouldn't say that's the case in general.

I added the ADD_ now.

kodebach commented 3 years ago

I mean something like:

Yes, in this case the memory management is hidden inside the library.

Sidenote: That line will actually fail horribly, if elektraNi_GetName returns an invalid escaped name (suffix). The key name of k won't be changed and the ksAppendKey a few lines down will just silently replace the parent key in the KeySet. Using a single keyNew without checking for errors will at least segfault, because of the returned NULL and not potentially corrupt the data in the KDB.

Yeah, the idea was to have the possibility of (generated) C-Code which you can include and assign to a KeySet variable easily.

Option 4 in https://github.com/ElektraInitiative/libelektra/issues/3529#issuecomment-738738878 fulfils this requirement as well. It's not as compact, but has more type safety. Most of all, if you use the elektraArray macro, you can never forget sentinel values anywhere.

markus2330 commented 3 years ago

That line will actually fail horribly

Yes, this is in the currently sad topic of error handling, I hope Stefan (who will work on the API) will find nice guidelines for this.

Segfault is definitely not a good option but failing is also not a good option, as it would make it impossible to fix any errors #1511.

Option 4 in #3529 (comment)

But it uses vaargs in macros, which is even worse for bindings than the current situation.

kodebach commented 3 years ago

failing is also not a good option, as it would make it impossible to fix any errors

In this case it is the correct option. If the file contains invalid key names, you either edited the file manually (in which case you can do that again to fix the problem) or the plugin is completely broken and actually produced the invalid key name (in which case there is no chance of fixing the key name anyway).

But it uses vaargs in macros, which is even worse for bindings than the current situation.

True macros are very bad for bindings. But macro is only there to add the NULL. The API totally works without the macro as well and maybe it's eve better without the macro.

I just think that the array API is much more versatile, since you can construct arrays dynamically while var-args and va_list are pre-defined at compile-time.

markus2330 commented 3 years ago

The API totally works without the macro as well and maybe it's eve better without the macro.

This changes everything, then at least the part with the ksNew replacement is a very interesting option. I like the idea with sorting and dynamically-sized key sets. It would be also interesting for build-in specs.

Can you give an example how it would look like without the macro?

pre-defined at compile-time.

Which would not be a bad thing if the compiler could optimize something but in this case it is actually making it slower as qsort is not possible on va_list.

kodebach commented 3 years ago

Can you give an example how it would look like without the macro?

Basically, ignore the macro and write the array casting bit yourself. I added an example to the showcase (with_macros vs without_macros).

The thing I don't like without the macro is, that it is far too easy to forget the NULL at the end of the array. In e.g. C++ this wouldn't be an issue, since we could use initializer_list which has a size field. That's why I would add additional versions of keyCreate/ksFromArray that take the sizes as arguments instead of (just) looking for NULL. These could of course be used from C, but would mainly be intended for bindings.

Basically I would do:

Key * keyCreate (elektraNamespace ns, const char ** parts, const char * value, const MetaKey ** metaKeys) {
    return keyCreate2 (ns, parts, 0, value, metaKeys, 0);
}

Key * keyCreate2 (elektraNamespace ns, const char ** parts, size_t partCount, const char * value, const MetaKey ** metaKeys, size_t metaKeyCount);

keyCreate2 takes at most partCount parts from parts, if partCount > 0 and always stops at the first NULL in parts. (same for metaKeys)

A similar thing would be done for ksFromArray.

markus2330 commented 3 years ago

The thing I don't like without the macro is, that it is far too easy to forget the NULL at the end of the array.

And it doesn't yield any warning/error? Is there a possibility to make a sentinel for that?

If we really would like to add a macro to the API it would need to be called ELEKTRA_CREATE_KEYSET or similar. But as this type is needed as first argument, the macro is also not ideal.

Am I correct that the API would be C89 but to actually create a KeySet within one expression C99 is needed?

keyCreate2

I am against having a duplicate of a function in the 1.0 API. The *2 approach is at best an ugly hack for mistakes of the past.

kodebach commented 3 years ago

And it doesn't yield any warning/error? Is there a possibility to make a sentinel for that?

AFAIK there is no way to tell the compiler to check for NULL in an array. I wouldn't even know how that would work. We are talking about C after all... The only reason we need the NULL is that without it nobody knows where the array ends.

If we really would like to add a macro to the API it would need to be called ELEKTRA_CREATE_KEYSET or similar. But as this type is needed as first argument, the macro is also not ideal.

Like I said, at the beginning. We could add the macro into an internal header and recommend users create their own, if they need one.

Am I correct that the API would be C89 but to actually create a KeySet within one expression C99 is needed?

Yes, the (const char *[]){ "elektra", NULL } syntax is C99+ only. But AFAIK we said C99 is required for some parts already and will be the minimum version for Elektra 1.0. I really don't think anybody still uses a compiler that doesn't have C99 support. At least not for Elektra.

I am against having a duplicate of a function in the 1.0 API. The *2 approach is at best an ugly hack for mistakes of the past.

The name keyCreate2 was just a placeholder. Something like keyCreateInternal might be more suitable. It better suggests that the API is intended for used internally (e.g. within bindings) and not directly by the user.

Of course we could also just use the keyCreate2 directly as keyCreate:

Key * keyCreate (elektraNamespace ns, const char ** parts, size_t partCount, const char * value, const MetaKey ** metaKeys, size_t metaKeyCount);

But creating a simple key would be quite a long function call:

// system:/
keyCreate (KEY_NS_SYSTEM, NULL, 0, NULL, NULL, 0);

// system:/elektra
keyCreate (KEY_NS_SYSTEM, (const char *[]){ "elektra", NULL }, 0, NULL, NULL, 0);
keyCreate (KEY_NS_SYSTEM, (const char *[]){ "elektra" }, 1, NULL, NULL, 0);

Maybe the solution is a more nested approach (optional macros for usability, static inline modifier also optional):

#define keyNew (ns, ...) keyCreate (ns, (const char *[]) { __VA_ARGS__, NULL })
Key * keyCreate (elektraNamespace, const char ** parts);

static inline Key * keyWithValue (Key * k, const char * value) {
    keySetString (k, value);
    return k;
}

#define keyWithMeta (k, ...) keyWithMetaArray (ns, (Key *[]) { __VA_ARGS__, NULL })

static inline Key * keyWithMetaArray (Key * k, Key * metaKeys) {
    ksAppend (keyMeta (k), ksFromArray (0, metaKeys));
    return k;
}

#define ksNew (alloc, ...) ksFromArray (ns, (Key *[]) { __VA_ARGS__, NULL })
KeySet * ksFromArray (size_t alloc, Key * keys);

This could be used like this:

KeySet * ks = ksNew (1,
  keyWithMeta (
    keyWithValue (keyNew (KEY_NS_SYSTEM, "elektra", "mountpoints"), "value"),
    keyWithValue (keyNew (KEY_NS_META, "comment"), "comment value"),
    keyWithValue (keyNew (KEY_NS_META, "type"), "string")
  )
);

To create (ni syntax):

system:/elektra/mountpoints = value

[system:/elektra/mountpoints]
comment = comment value
type = string

It's a bit more verbose for keys with value and metadata, but it is more compact and easier to use for keys with just a name.

And I what to stress again: I'm just throwing ideas out there. The only preferences I have for the new API are "preferably no varargs" and "no escaped names".

kodebach commented 3 years ago

In https://github.com/ElektraInitiative/libelektra/pull/3606#issuecomment-753629132 the idea for a keyEmpty() function came up. @markus2330 responded:

We can add something like this in private headers or in libease. But I do not think it should be in the public core API. The proper name would be keyNewCascading and then people ask why there is no keyNewUser and so on...

I disagree that this would be the "proper" name. The idea behind keyEmpty() is that it creates the most basic Key possible, containing no data whatsoever.

I think this should be part of the public API for multiple reasons:

There are quite few instance where you want a basic empty key (e.g. error reporting). It should be easy to create such a key.
The current replacement for keyDup (k) would be keyCopy (keyNew ("/", KEY_END), k, ~0). Which is much harder to understand. keyCopy (keyEmpty(), k, ~0) seems a lot more clear ("copy k into an empty key" = "duplicate k").
Once the new keyNew is implemented (whichever API we choose), things will just get worse. keyNew ("/", KEY_END) could become one of:
- keyCreate (KEY_NS_CASCADING, NULL, 0, NULL, NULL, 0)
- keyNew (KEY_NS_CASCADING)
- keyFromParts (KEY_NS_CASCADING, NULL)
- keyNew (KEY_NS_CASCADING, KEY_END)
- keyNew (KEY_NS_CASCADING, NULL, 0)
All of this assumes, that people know "/" and KEY_NS_CASCADING are the "correct" names for an empty/dummy Key. This matters, because this could be optimised. Using the old API keyNew ("/", KEY_END) would only allocate a new struct _Key, but no key names, values or metadata. Theoretically we could go even further and make keyNew ("/", KEY_END) return a static instance. This would of course require changing all code that modifies a key (keySetString, keySetName, etc.) to clone this static instance when needed, similar to what we do with KEY_FLAG_MMAP_*.

markus2330 commented 3 years ago

The idea behind keyEmpty() is that it creates the most basic Key possible, containing no data whatsoever.

But it contains the name of the root of the cascading namespace. This is imho not empty. In general we are prone to introduce too much terminology. I think we should drop the whole idea of "empty keys", we now ensure there is always a name and can be proud of this :wink:

Maybe we can call it keyNull(): it has a name but the data is null. Not really thrilled about it, though. Additionally, I do not really understand the timing: now you renamed all the occurrences to keyNew ("/", KEY_END) and you want to rename them again? For me, with the current API keyNew ("/", KEY_END) feels to be the correct way.

kodebach commented 3 years ago

"empty key" is not new terminology I want to introduce. keyEmpty() is just a name. I'd be fine with another name like keyNull() as well. (Although I don't quite understand how Null vs. Empty makes a difference).
The Key created by keyEmpty() is as empty as it can be. A Key cannot have no value and it cannot have no name. The closest you can come is keyNew ("/", KEY_END) (or maybe from a user POV keyNew ("/", KEY_BINARY, KEY_END) so that keyValue returns NULL).
Again since #3555 it is no longer possible to create a Key without a name. If I don't care about the name, the correct one to use (for future optimization purposes) is "/". To make that more explicit a function like keyEmpty() would be helpful.
I don't want to change occurrences of keyNew ("/", KEY_END) immediately. The switch to keyEmpty() would be made together with the implementation of this issue, which is about a completely new API for keyNew. So adding keyEmpty() wouldn't really add any work. (Sidenote: sed 's~keyNew ("/", KEY_END)~keyEmpty()~g' should work, so it really doesn't matter when we do it)

markus2330 commented 3 years ago

just a name = just a term; null keys is a term for keys that have a null pointer as value
nc
it is essential to have a good name if you want to introduce something to the public API.
That is one of the problem of such an alias: it gives you two options to do the same thing.

kodebach commented 3 years ago

"Null key" is an established term yes. But this actually a reason to not call the function keyNull(). The point is not creating a key without a value. The point is a simple solution for the situation: "I just want an instance of Key. I don't care what it contains." Maybe keyBare() or keyBlank()? ("bare" or "blank" because the resulting Key contains the bare minimum bytes, i.e. it is essentially blank)

it gives you two options to do the same thing.

Because it is an alias, I don't see it as too options. The idea is that keyEmpty() (or whatever the name will be) will always be defined as:

static inline keyEmpty() { return keyNew ("/", KEY_END); }

This line would just be added to kdb.h and thereby the whole line (including the definition of keyEmpty()) would become public API.

Yes, there are two options in the sense, that users can write keyEmpty() or keyNew("/", KEY_END). But the point is we guarantee that this will always result in the same behaviour, so it is purely a matter of preference and not a meaningful decision.

markus2330 commented 3 years ago

The statements "I just want an instance of Key. I don't care what it contains." and to provide the very strong guarantee "that [two statements] will always result in the same behaviour," are very contradicting goals. Together with issues in terminology, I think we should drop the idea to make such an alias.

kodebach commented 3 years ago

Where exactly do you see the contradiction?

"I just want an instance of Key. I don't care what it contains." implies a call to keyNew since that is the only way to create an instance of Key. There are no other implications since we don't care what the instance contains.

"that [keyEmpty() and keyNew("/", KEY_END)] will always result in the same behaviour," is a completely separate concern. First of all, this can only be guaranteed as long as keyNew("/", KEY_END) is valid. If the keyNew's signature changes, this guarantee changes too (e.g. replace keyNew("/", KEY_END) with keyNew(KEY_NS_CASCADING)).

The guarantee can simply be achieved by adding

static inline keyEmpty() { return keyNew ("/", KEY_END); }

to kdb.h and never changing this line. (again, unless keyNew's signature changes)

I really don't see what the huge problem is, but since you seem absolutely convinced that adding keyEmpty() is a bad idea, I guess we won't do it.

mpranj commented 3 years ago

Intuitively it sounds like an empty Key should look something like keyNew("", KEY_END) (so without the "/"). Because of this it would be good to have keyEmpty() to make it very clear how to get an empty key.

But re-thinking my own code.. are there any situations where you really need an empty key? Thinking about it I think I could rewrite everything to just create/initialize the Key at the right point and avoid empty Keys completely. Am I overlooking something?

markus2330 commented 3 years ago

But re-thinking my own code.. are there any situations where you really need an empty key?

I hope you never really need it, as it does not exist anymore. As every key now has a name, only the value can be "empty", we call these keys "null keys", though.

kodebach commented 3 years ago

an empty Key should look something like keyNew("", KEY_END)

That doesn't work since #3555, because "" is not a valid (escaped) key name anymore.

are there any situations where you really need an empty key?

The most common situation currently is when a Key is used for error reporting only (e.g. kdbOpen).

Once we remove escaped names and change keyNew (that's the whole reason why I started the discussion in this issue), however, the most common situation by far will be what is now solved by keyNew (keyName (k), KEY_END). This has to be replaced by keyCopy (keyNew ("/", KEY_END), k, KEY_NAME) (that's why I originally had the idea in #3606).

At least in the code that I wrote, things like keyNew (keyName (k), KEY_END) happen quite a bit.

Maybe another solution would be to allow keyCopy (NULL, k, flags) as and alias for keyCopy (keyNew ("/", KEY_END), k, flags).

markus2330 commented 3 years ago

Maybe another solution would be to allow keyCopy (NULL, k, flags) as and alias for keyCopy (keyNew ("/", KEY_END), k, flags).

I like this much more.

stale[bot] commented 2 years ago

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

kodebach commented 2 years ago

@markus2330 is a change like this still in discussion? For some versions automatic refactoring may be possible.

Also, with the new idea of a libelektra-operations library, the best variant for a minimal core API might be a basic:

// returns a new Key with the name set to "/"
Key * keyCreate (void)

// returns a new KeySet with an initial allocation size of `alloc`
KeySet * ksCreate (size_t alloc);

It would of course mean that creating Keys and KeySets with just libelektra-core would need more code and couldn't be done inline. But it means that every alternative implementation can easily match this API (no variadic args needed). Additionally they can provide another API on top that is more idiomatic to the language.

For C this means providing a version of keyNew (either one with variadic arguments or maybe we macros) in libelektra-operations. As a first step keyNew would just be moved from core to operations. But after that we may want to think about a new API again to avoid escaped names.

For Rust it would probably be a builder-style interface, or something using macros.

As a bonus this also makes bindings for libelektra-core easier. Many languages struggle with C variadic arguments, but can easily bind to the function suggested above. In fact, most of our current bindings try to ignore the variadic arguments and just pass a keyname and then call setters for value and metadata.

lawli3t commented 2 years ago

It would be quite interesting, especially because VAList is a bit iffy in Rust, although I got it to work now I still have some issues with it (such as reading memory from stack which I shouldn't be reading). The simplest way, which I think also would work would be just removing the variadic arguments and only having name as a required parameter, all other properties of the Key should be manipulated by the respective functions (keySetValue(), ...).

kodebach commented 2 years ago

Just passing the name (and separately the namespace) was my idea at first too, since every Key must have a name. However, this means one of three things needs to be the case:

We use the escaped name: Not ideal. Every time a key is created, we need to do unescaping, even if the unescaped name would be known at compile-time.
We use the unescaped as a list of parts: Also not ideal. First of all, in C we need to pass the number of parts separately or hope that there is a NULL at the end. With variadic arguments, the compiler can enforce the sentinel NULL value, but we wanted to avoid var-args. More importantly, this now means that we need to assemble the unescaped name from the parts. This involves multiple memcpys and either multiple mallocs or multiple iterations of the parts list.
We pass the unescaped name as as single string with embedded zero bytes (e.g. "foo\0bar"): The best of a bunch of bad options. We still need a separate argument for the size of the string, but at least it would now be a single malloc and memcpy to store the actual key name.

So I just concluded, put the absolute bare minimum into libelektra-core and do something specific for each language.

Note: All of this assumes, that we only store the unescaped name.

PS. In Rust you could still do something like below on top of the basic libelektra-core interface. This is essentially option 3, but much nicer, because Rust knows the length of a slice.

trait Key {
    fn new(ns: Namespace, name: &[u8]) -> Self;
    fn set_value(&mut self, value: &[u8]) -> Self;
    fn set_meta(&mut self, name: &[u8], value: &[u8]) -> Self;
}

Note: I haven't done that much in Rust, so the code might not work (error handling is definitely missing). But the syntax should be correct enough to get the idea across.

lawli3t commented 2 years ago

In this case the name is also &[u8] ? I always assumed the name to be a string. Or is this something you want to change as well?

Yes, in Rust I would go for something like that. I really dont like having to pass a length parameter, I assume this is due to the name param no longer being a string? Are there any upsides I am missing in changing name to be a binary value as well?

kodebach commented 2 years ago

In this case the name is also &[u8] ? I always assumed the name to be a string. Or is this something you want to change as well?

This is a consequence of me wanting to avoid the escaped name as much as possible (since unescaping can be expensive). Since the unescaped name contains \0 bytes &[u8] is probably more correct.

I assume this is due to the name param no longer being a string? Are there any upsides I am missing in changing name to be a binary value as well?

Exactly, with the escaped name we can just pass a zero-terminated char *. But for the unescaped name this doesn't work. We need to know the length and C this needs another parameter.

The advantage of using the unescaped name directly is that in the ideal case (which would be possible with the Rust API), you just need to copy the name into the struct with a single memcpy.

For most escaped names the unescaped name isn't too expensive either (*), but when you have something like user:/foo/bar/../../#100/../foo\/bar/#200 instead of just user:/foo\/bar/#__200, then it gets expensive.

(*) the name still needs to be iterated multiple times and is copied character by character instead of as a single memcpy.

markus2330 commented 2 years ago

@markus2330 is a change like this still in discussion? For some versions automatic refactoring may be possible. We pass the unescaped name as as single string with embedded zero bytes (e.g. "foo\0bar"): The best of a bunch of bad options.

:+1:

We still need a separate argument for the size of the string, but at least it would now be a single malloc and memcpy to store the actual key name.

Not if we terminate with two \0?

So I just concluded, put the absolute bare minimum into libelektra-core and do something specific for each language.

:+1:

kodebach commented 2 years ago

Not if we terminate with two \0?

Empty key name parts are allowed and appear as \0\0 in the key name. In general a sequence of N \0 bytes will result in N-1 empty parts.

To restate the definition: A key name is an arbitrary sequence of bytes that fulfils exactly three conditions:

The first byte is a valid namespace byte.
The sequence is either 1 byte long, or the second byte is a \0.
The last byte is a \0. (This is automatically added by the implementation, so shouldn't be passed in)

Note: AFAIK it is not currently implemented like this. I think there is still an issue (didn't find it), for the fact that namespace roots have 3 bytes right now (namespace, and two zero bytes).

That is why I proposed the Rust API new(ns: Namespace, name: &[u8]). This directly shows that we need a namespace and an arbitrary sequence of bytes. The \0 bytes from 2 and 3 above, are always there and will be added by the implementation of new, there is no reason why they should be passed in.

markus2330 commented 2 years ago

Sorry, yes, two \0 is not enough but it needs to be a special magic sequence not allowed otherwise, i.e. @elektra@endor similar. The same problem exists for keySetName. Let us discuss this today.

kodebach commented 2 years ago

Discussion result: The end marker is bad for developer experience (annoying to type), creates bloat in the binary (you need it in every keyNew call), makes the source code harder to read and adds an unnecessary restriction. We will instead use a separate size argument in the C API. Other languages should use a String or array-like type with a length field, if available.

github-actions[bot] commented 1 year ago

I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

kodebach commented 1 year ago

See #4715 and #4805

github-actions[bot] commented 8 months ago

I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

github-actions[bot] commented 8 months ago

I closed this now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

ElektraInitiative / libelektra

keyNew overhaul #3529