Closed kodebach closed 5 years ago
Thank you for creating this proposal!
Do you think ksExtract should be the function instead of ksDup? If it is only an idea for a new feature, we could also add it any later time (it would not be relevant for 1.0).
Btw. I wonder why you did not request for a KeySet * ksExtract (const KeySet * ks, const Key * parentKey, int (*toextract)(Key*, Key*));
. When someone passes keyIsBelowOrSame
you would get something like you requested above.
Do you think ksExtract should be the function instead of ksDup?
They could be merged, but then we would should optimise for the ksDup
case. I don't think that is a good idea. It just complicates the implementation and the API is clearer, if ksDup
exists on its own. I wouldn't expect that ksExtract
is what I need to duplicate a KeySet.
Btw. I wonder why you did not request for a
KeySet * ksExtract (const KeySet * ks, const Key * parentKey, int (*toextract)(Key*, Key*));
I assume the function would take the keys below parentKey
(including parentKey
) and then additionally filter them with toextract
(should use const Key *
btw). I don't see why that would have to be separate function. You could easily iterate over the extracted keys yourself and just skip any keys you don't care about.
It also prevents an optimisation that could be applied to ksCut
as well. (Note: I haven't tested this)
ksSearchInternal
returns the index of the given key, or if isn't present the index of the first present key that is bigger than the given key (according to keyset order). ksCut
already uses this to find the start of the cut.
The end of the cut, could be found via ksSearchInternal
as well. We just need to search for the right key. An example:
user/test/abcd
user/test/cutpoint
user/test/cutpoint/a
user/test/cutpoint1
user/test/xyz
⓪
for the NULL character U+0000):
user⓪test⓪abcd⓪
user⓪test⓪cutpoint⓪
user⓪test⓪cutpoint⓪a⓪
user⓪test⓪cutpoint1⓪
user⓪test⓪xyz⓪
ksCut
with the key user/test/cutpoint
.ksSearchInternal
to find the start of the cut. It searches the unescaped name user⓪test⓪cutpoint⓪
.user/test/cutpoint1
. Instead we can use ksSearchInternal
again.
user⓪test⓪cutpoint①⓪
(①
represents U+0001). This is the first possible unescaped key name that is not below user/test/cutpoint
.ksSearchInternal
will just return the next key that is present.wouldn't expect that ksExtract is what I need to duplicate a KeySet.
Exactly, and ksExtract is basically only ksDup
with filter.
Thank you for looking into the details. But I think ksExtract is only an new feature and does not really help for 1.0.
The optimization for ksCut is a very good idea, can you maybe split this up in a second issue? But for 1.0 we also only should do optimizations which are really important (have use cases that are too slow otherwise.)
Exactly, and ksExtract is basically only ksDup with filter.
Not sure, what I want say here... ksExtract
is technically only a limited ksDup
. Sometimes you need exactly that and without ksExtract
you get a lot of unnecessary overhead (either copying and deleting additional keys, or cutting and appending needlessly).
Replacing ksDup
with ksExtract
would be possible, like you said, but it would be a confusing naming scheme. If we only want a single function, we should change ksDup
to accept a parent key (if it is NULL
we use the old behaviour). With symbol versioning this shouldn't be hard to do.
But for 1.0 we also only should do optimizations which are really important (have use cases that are too slow otherwise.)
I'm not sure about "too slow otherwise", but ksCut
is definitely used a lot, so any improvement could make a big difference.
If we can efficiently extract/cut arrays, we actually do not need to store the length of arrays, do we? We could reconsider doc/decisions/array.md
Another idea: what about a ksLookup
option to find the element hierarchically after some key.
So something like ksLookup(ks, "user/test/abcd", KDB_O_HIERARCHY)
would return
user/test/xyz
(in your example above)
we actually do not need to store the length of arrays, do we?
Both the spec
plugin and the high-level API rely on the array
metakey. In general I also think that storing the size of an array separately is a good idea. It is basically the same reasoning that also explains, why most (if not all) modern programming languages store string sizes, instead of calculating them when needed like C.
What we should do instead, to make handling of arrays easier, is introduce things like kdb array add <arrayParentKey> <index>
and kdb array del <arrayParentKey> <index>
.
Another idea: what about a
ksLookup
option to find the element hierarchically after some key.So something like
ksLookup(ks, "user/test/abcd", KDB_O_HIERARCHY)
would returnuser/test/xyz
(in your example above)
Not sure what "hierarchically after" means. If you mean the next key after X (according to keyset order) that is not below X, then your example is wrong.
Nevertheless, I think ksLookup
is already more than complex enough and we should not add anything else. Adding new options to ksLookup
complicates the code. This introduces potential for errors and most likely also creates a performance overhead (not a big one, but ksLookup
is one of the most called functions in Elektra, so every little bit counts).
There are also further question with this approach: How does it combine with other option? What happens with cascading keys?
IMO we should stick with KISS and introduce a new function, if we need new functionality. Also this kind of lookup would probably be more useful, if it works with iterators/cursors/indices (whatever we want to call it). Then it could be used in an iteration to skip subtrees. I can't actually think of another use case right now, especially not one, where I just want a Key *
.
Both the spec plugin and the high-level API rely on the array metakey.
Good to know! This is completely undocumented in doc/METADATA.ini where array is still an unused and only proposed feature.
What we should do instead, to make handling of arrays easier, is introduce things like kdb array add
and kdb array del .
Yes, I agree. But this is not essential for 1.0.
Not sure what "hierarchically after" means.
The key after the cut point (as you described above).
most likely also creates a performance overhead
ksLookup already has a switch (or cascading if) for the options. So adding another option should not create a performance impact.
There are also further question with this approach: How does it combine with other option? What happens with cascading keys?
This we need to consider anyway.
IMO we should stick with KISS and introduce a new function
Have you a suggestion for a name?
And there is some overhead for adding functions to a library. (Did you maybe already measured it?)
it works with iterators/cursors/indices (whatever we want to call it). Then it could be used in an iteration to skip subtrees
With the internal cursor ksLookup
is the way to go as it moves the internal cursor.
The key after the cut point (as you described above).
Then your example should have been one of:
ksLookupByName (ks, "user/test/abcd", KDB_O_HIERARCHY); // -> user/test/cutpoint
ksLookupByName (ks, "user/test/cutpoint", KDB_O_HIERARCHY); // -> user/test/cutpoint1
ksLookupByName (ks, "user/test/cutpoint/a", KDB_O_HIERARCHY); // -> user/test/cutpoint1
ksLookupByName (ks, "user/test/cutpoint1", KDB_O_HIERARCHY); // -> user/test/xyz
So adding another option should not create a performance impact.
Like I said, it would be a tiny performance impact, but ksLookup
is called thousands of times in an application (about 20 000 times in a minimal LCDd
setup) so even that could matter.
And there is some overhead for adding functions to a library.
In terms of performance?
With the internal cursor
ksLookup
is the way to go as it moves the internal cursor.
Don't we want to get rid of the internal cursor? (https://github.com/ElektraInitiative/libelektra/issues/2991#issuecomment-533862034)
Have you a suggestion for a name?
Depends on what the function will actually do and whether is based on Key *
or cursor_t
.
In general, I think ksLookup
should be for looking up a key, i.e. I have a keyname (cascading or not) and want to find the actual Key *
inside a keyset for this keyname. In other words the input keyname of ksLookup
should always match its output keyname (apart from the namespace).
Another function should be created for different kinds of search operations. It might be useful not only to find the next key on the same level, but also the previous one or maybe the first key above (the immediate parent may not be in the keyset) or below.
In terms of performance?
Yes, also performance, as the symbols need to be looked up when loading the libs. KDE hat huge problems with this but they also have an order of magnitude more libs.
Don't we want to get rid of the internal cursor? (#2991 (comment))
Realistically, we will only be able to remove it from the public interface but not completely. Also the error messages need to be updated to contain the cause (@Piankero) because at the moment the cursor marks where the error happened.
In general, I think ksLookup should be for looking up a key
Yes, it would be a cleaner interface. But then we also should remove KDB_O_POP which is actually the only left-over. If we get rid of this, we could avoid the argument to ksLookup
at all.
And if we do this only in the public interface (and let the private interface more or less the same), we could even do it for 1.0.
Realistically, we will only be able to remove it from the public interface but not completely.
Why? Where do we actually need the internal cursor?
because at the moment the cursor marks where the error happened.
I don't think this is always the case. In certain situations, kdbGet
executes more plugins after an error and many plugins don't set the cursor to the key that caused the error in the first place.
Yes, it would be a cleaner interface. But then we also should remove KDB_O_POP which is actually the only left-over. If we get rid of this, we could avoid the argument to
ksLookup
at all.
KDB_O_CREATE
and KDB_O_DEL
might also be used in some places, though the should be easy to replace.
Why? Where do we actually need the internal cursor?
Maybe nowhere (except for the old system to pinpoint which key caused the error) but there are 271 occurrences of ksNext in the folder "src". And in these 271 places it is not enough to replace a function call but you actually need to rewrite loops. So there are also plenty ways of how bugs can be introduced.
I don't think this is always the case. In certain situations, kdbGet executes more plugins after an error and many plugins don't set the cursor to the key that caused the error in the first place.
Yes, exactly.
KDB_O_CREATE and KDB_O_DEL might also be used in some places, though the should be easy to replace.
In only about 10 places, so this should be manually doable in one day of work.
there are 271 occurrences of ksNext in the folder "src".
I never said it would be easy
I think all of this could be moved to 2.0.0. If there is something reasonable for 1.0.0, please tell me.
The snippet
or the alternative
are used quite frequently. We should add a counterpart to
ksCut
that does not modify the original KeySet. The ideal signature would be:The
const KeySet *
would be possible with an external iterator, but it might even be possible with the old system.Originally posted by @kodebach in https://github.com/ElektraInitiative/libelektra/issues/2979#issuecomment-533130499