ksExtract - Githubissues

kodebach commented 5 years ago

The snippet

KeySet * ks1 = ksDup(ks);
KeySet * interestingPart = ksCut (ks1, key);
ksDel (ks1);
// .... use interestingPart ....

or the alternative

KeySet * interestingPart = ksCut (ks, key);
// .... use interestingPart ....
ksAppend (ks, interestingPart);

are used quite frequently. We should add a counterpart to ksCut that does not modify the original KeySet. The ideal signature would be:

// ksCopy would go better with ksCut, but it is already in use
KeySet  * ksExtract (const KeySet * ks, const Key * parentKey);

The const KeySet * would be possible with an external iterator, but it might even be possible with the old system.

Originally posted by @kodebach in https://github.com/ElektraInitiative/libelektra/issues/2979#issuecomment-533130499

markus2330 commented 5 years ago

Thank you for creating this proposal!

Do you think ksExtract should be the function instead of ksDup? If it is only an idea for a new feature, we could also add it any later time (it would not be relevant for 1.0).

Btw. I wonder why you did not request for a KeySet * ksExtract (const KeySet * ks, const Key * parentKey, int (*toextract)(Key*, Key*));. When someone passes keyIsBelowOrSame you would get something like you requested above.

kodebach commented 5 years ago

Do you think ksExtract should be the function instead of ksDup?

They could be merged, but then we would should optimise for the ksDup case. I don't think that is a good idea. It just complicates the implementation and the API is clearer, if ksDup exists on its own. I wouldn't expect that ksExtract is what I need to duplicate a KeySet.

Btw. I wonder why you did not request for a KeySet * ksExtract (const KeySet * ks, const Key * parentKey, int (*toextract)(Key*, Key*));

I assume the function would take the keys below parentKey (including parentKey) and then additionally filter them with toextract (should use const Key * btw). I don't see why that would have to be separate function. You could easily iterate over the extracted keys yourself and just skip any keys you don't care about.

It also prevents an optimisation that could be applied to ksCut as well. (Note: I haven't tested this)

ksSearchInternal returns the index of the given key, or if isn't present the index of the first present key that is bigger than the given key (according to keyset order). ksCut already uses this to find the start of the cut.

The end of the cut, could be found via ksSearchInternal as well. We just need to search for the right key. An example:

Assume a keyset with the following keys:

user/test/abcd
user/test/cutpoint
user/test/cutpoint/a
user/test/cutpoint1
user/test/xyz

These keys have the unescaped names (I use ⓪ for the NULL character U+0000):

user⓪test⓪abcd⓪
user⓪test⓪cutpoint⓪
user⓪test⓪cutpoint⓪a⓪
user⓪test⓪cutpoint1⓪
user⓪test⓪xyz⓪

We called ksCut with the key user/test/cutpoint.
It calls ksSearchInternal to find the start of the cut. It searches the unescaped name user⓪test⓪cutpoint⓪.
The current iteration now starts a loop and checks every following key, until it finds user/test/cutpoint1. Instead we can use ksSearchInternal again.
- This time we search for the unescaped name user⓪test⓪cutpoint①⓪ (① represents U+0001). This is the first possible unescaped key name that is not below user/test/cutpoint.
- While this key will almost never be present in the keyset, that doesn't matter ksSearchInternal will just return the next key that is present.
- That means the returned key will always be the first key after our cut.

markus2330 commented 5 years ago

wouldn't expect that ksExtract is what I need to duplicate a KeySet.

Exactly, and ksExtract is basically only ksDup with filter.

Thank you for looking into the details. But I think ksExtract is only an new feature and does not really help for 1.0.

The optimization for ksCut is a very good idea, can you maybe split this up in a second issue? But for 1.0 we also only should do optimizations which are really important (have use cases that are too slow otherwise.)

kodebach commented 5 years ago

Exactly, and ksExtract is basically only ksDup with filter.

Not sure, what I want say here... ksExtract is technically only a limited ksDup. Sometimes you need exactly that and without ksExtract you get a lot of unnecessary overhead (either copying and deleting additional keys, or cutting and appending needlessly).

Replacing ksDup with ksExtract would be possible, like you said, but it would be a confusing naming scheme. If we only want a single function, we should change ksDup to accept a parent key (if it is NULL we use the old behaviour). With symbol versioning this shouldn't be hard to do.

But for 1.0 we also only should do optimizations which are really important (have use cases that are too slow otherwise.)

I'm not sure about "too slow otherwise", but ksCut is definitely used a lot, so any improvement could make a big difference.

markus2330 commented 5 years ago

If we can efficiently extract/cut arrays, we actually do not need to store the length of arrays, do we? We could reconsider doc/decisions/array.md

markus2330 commented 5 years ago

Another idea: what about a ksLookup option to find the element hierarchically after some key.

So something like ksLookup(ks, "user/test/abcd", KDB_O_HIERARCHY) would return user/test/xyz (in your example above)

kodebach commented 5 years ago

we actually do not need to store the length of arrays, do we?

Both the spec plugin and the high-level API rely on the array metakey. In general I also think that storing the size of an array separately is a good idea. It is basically the same reasoning that also explains, why most (if not all) modern programming languages store string sizes, instead of calculating them when needed like C.

What we should do instead, to make handling of arrays easier, is introduce things like kdb array add <arrayParentKey> <index> and kdb array del <arrayParentKey> <index>.

Another idea: what about a ksLookup option to find the element hierarchically after some key.

So something like ksLookup(ks, "user/test/abcd", KDB_O_HIERARCHY) would return user/test/xyz (in your example above)

Not sure what "hierarchically after" means. If you mean the next key after X (according to keyset order) that is not below X, then your example is wrong.

Nevertheless, I think ksLookup is already more than complex enough and we should not add anything else. Adding new options to ksLookup complicates the code. This introduces potential for errors and most likely also creates a performance overhead (not a big one, but ksLookup is one of the most called functions in Elektra, so every little bit counts).

There are also further question with this approach: How does it combine with other option? What happens with cascading keys?

IMO we should stick with KISS and introduce a new function, if we need new functionality. Also this kind of lookup would probably be more useful, if it works with iterators/cursors/indices (whatever we want to call it). Then it could be used in an iteration to skip subtrees. I can't actually think of another use case right now, especially not one, where I just want a Key *.

markus2330 commented 5 years ago

Both the spec plugin and the high-level API rely on the array metakey.

Good to know! This is completely undocumented in doc/METADATA.ini where array is still an unused and only proposed feature.

What we should do instead, to make handling of arrays easier, is introduce things like kdb array add and kdb array del .

Yes, I agree. But this is not essential for 1.0.

Not sure what "hierarchically after" means.

The key after the cut point (as you described above).

most likely also creates a performance overhead

ksLookup already has a switch (or cascading if) for the options. So adding another option should not create a performance impact.

There are also further question with this approach: How does it combine with other option? What happens with cascading keys?

This we need to consider anyway.

IMO we should stick with KISS and introduce a new function

Have you a suggestion for a name?

And there is some overhead for adding functions to a library. (Did you maybe already measured it?)

it works with iterators/cursors/indices (whatever we want to call it). Then it could be used in an iteration to skip subtrees

With the internal cursor ksLookup is the way to go as it moves the internal cursor.

kodebach commented 5 years ago

The key after the cut point (as you described above).

Then your example should have been one of:

ksLookupByName (ks, "user/test/abcd", KDB_O_HIERARCHY); // -> user/test/cutpoint
ksLookupByName (ks, "user/test/cutpoint", KDB_O_HIERARCHY); // -> user/test/cutpoint1
ksLookupByName (ks, "user/test/cutpoint/a", KDB_O_HIERARCHY); // -> user/test/cutpoint1
ksLookupByName (ks, "user/test/cutpoint1", KDB_O_HIERARCHY); // -> user/test/xyz

So adding another option should not create a performance impact.

Like I said, it would be a tiny performance impact, but ksLookup is called thousands of times in an application (about 20 000 times in a minimal LCDd setup) so even that could matter.

And there is some overhead for adding functions to a library.

In terms of performance?

With the internal cursor ksLookup is the way to go as it moves the internal cursor.

Don't we want to get rid of the internal cursor? (https://github.com/ElektraInitiative/libelektra/issues/2991#issuecomment-533862034)

Have you a suggestion for a name?

Depends on what the function will actually do and whether is based on Key * or cursor_t.

In general, I think ksLookup should be for looking up a key, i.e. I have a keyname (cascading or not) and want to find the actual Key * inside a keyset for this keyname. In other words the input keyname of ksLookup should always match its output keyname (apart from the namespace).

Another function should be created for different kinds of search operations. It might be useful not only to find the next key on the same level, but also the previous one or maybe the first key above (the immediate parent may not be in the keyset) or below.

markus2330 commented 5 years ago

In terms of performance?

Yes, also performance, as the symbols need to be looked up when loading the libs. KDE hat huge problems with this but they also have an order of magnitude more libs.

Don't we want to get rid of the internal cursor? (#2991 (comment))

Realistically, we will only be able to remove it from the public interface but not completely. Also the error messages need to be updated to contain the cause (@Piankero) because at the moment the cursor marks where the error happened.

In general, I think ksLookup should be for looking up a key

Yes, it would be a cleaner interface. But then we also should remove KDB_O_POP which is actually the only left-over. If we get rid of this, we could avoid the argument to ksLookup at all.

And if we do this only in the public interface (and let the private interface more or less the same), we could even do it for 1.0.

kodebach commented 5 years ago

Realistically, we will only be able to remove it from the public interface but not completely.

Why? Where do we actually need the internal cursor?

because at the moment the cursor marks where the error happened.

I don't think this is always the case. In certain situations, kdbGet executes more plugins after an error and many plugins don't set the cursor to the key that caused the error in the first place.

Yes, it would be a cleaner interface. But then we also should remove KDB_O_POP which is actually the only left-over. If we get rid of this, we could avoid the argument to ksLookup at all.

KDB_O_CREATE and KDB_O_DEL might also be used in some places, though the should be easy to replace.

markus2330 commented 5 years ago

Why? Where do we actually need the internal cursor?

Maybe nowhere (except for the old system to pinpoint which key caused the error) but there are 271 occurrences of ksNext in the folder "src". And in these 271 places it is not enough to replace a function call but you actually need to rewrite loops. So there are also plenty ways of how bugs can be introduced.

I don't think this is always the case. In certain situations, kdbGet executes more plugins after an error and many plugins don't set the cursor to the key that caused the error in the first place.

Yes, exactly.

KDB_O_CREATE and KDB_O_DEL might also be used in some places, though the should be easy to replace.

In only about 10 places, so this should be manually doable in one day of work.

kodebach commented 5 years ago

there are 271 occurrences of ksNext in the folder "src".

I never said it would be easy

markus2330 commented 5 years ago

I think all of this could be moved to 2.0.0. If there is something reasonable for 1.0.0, please tell me.

ElektraInitiative / libelektra

ksExtract #2992