comments on highlevel API and LCDproc work

ElektraInitiative / libelektra

Elektra serves as a universal and secure framework to access configuration settings in a global, hierarchical key database.

https://www.libelektra.org

BSD 3-Clause "New" or "Revised" License

208 stars 123 forks source link

comments on highlevel API and LCDproc work #2772

Closed haraldg closed 4 years ago

haraldg commented 5 years ago

Hi,

I just have looked at a few samples of the LCDproc code you wrote and in extension the highlevel API for the first time. And with "looked" I mean just looked: Hadn't time yet to setup a container where I can safely build elektra master and try a few things for real or check the benchmarks we defined at the start.

The first thing to notice is, that the code is very verbose. While tokens like ELEKTRA_TAG_LCDEXEC_ADDRESS are easy to understand on first glance, I find that they make the code harder to read if you have many lines of this.

Next you seem to love opaque data types. I'm aware of the benefits for keeping the ABI stable while things change under the surface. However the downside is, that this makes the code again harder to read in huge quantities and also bloats the resulting binaries a lot. (Each call to an external getter typically takes at least 3 instructions and invalidates the contents of all registers. From a green programming POV that's rather unfortunate.) The latter problem can be "fixed" by making the getter inlineable, but of course at the cost of all the benefits. I'd reconsider which data types are actually worth being opaque.

I see that you are providing your own versions of basic data types like long. What problem are you trying to solve with that? IMO it mostly makes the code more difficult to read.

In particular I dislike, that you also used elektra data types in LCDproc data structures not directly related to configuration.

That error handles are malloc()ed data structures is rather surprising. Especially because I remember we had a discussion about a year ago, where I advocated treating malloc() as fail safe and you disagreed. Now you are using malloc in a context where you clearly have to treat it as fail safe. There are only a few functions, that actually can cause errors, so this is a rather unimportant topic, but I find the choice rather unorthodox. Most people seem to try to do error handling on the stack.

Maybe this is already answered in some document, but I didn't find it quickly: When elektraGet() returns a string (or other pointer to a data structure), what's the scope where the object is valid? My guess is until elektraClose() is called, but it's not obvious. Also what's a good policy for keeping the Elektra handle around with regards to resource usage (and other considerations)?

I see that you sticked pretty close to the original structure to the code. I guess for some of your goals this makes sense, but OTOH I wonder if you can really show off elektra while sticking to the structure dictated by a foreign configuration API. Maybe you can share your thoughts on the topic. Also maybe it is interesting for the thesis to compare for one driver the difference between a straight translation and an idiomatic elektra implementation, if there is any.

Sorry if this sounds mostly negativ. I suppose it's natural, that, when looking at something very quickly, mostly the negative things stick out. I will try to soon find the time to explore your code in more detail.

HTH, Harald

markus2330 commented 5 years ago

The specValidation option is somewhat independent of the other two. defaultsHandling (DH) and specLocation (SL) always have to be coordinated to some degree.

One reason more to not export this interface to the user.

The last combination specLocation=embed and defaultsHandling=speconly doesn't have as much of a use.

Another reason.

I do not know why you are so reluctant in making this interface easier to use. If it is because you already implemented everything, then you could simply expose only one option (let us say specificationStrategy) and from this option you derive the 3 other options.

After adding the different modes and some more testing and experimenting this week, I am now firmly of the opinion that this is not a good idea. Of course the feedback would be nice,

Feedback is not nice but essential for the success of the LCDproc elektrification.

but I don't see a way that this PR will be merged in the near future (=this summer).

This is a completely different matter that is not controlled by you.

The closer I get to finishing my work, the more problems in other parts of Elektra find.

Good, please create issues.

For example today I found this problem.

Please report this issue (even if you cannot reproduce it yet). For LCDproc this issue is irrelevant as we decided that we will not care about validation. Every attempt to set a boolean to something else than "1" or "0" is not supported. I do not think that LCDproc people will have problems understanding "1" or "0".

At this point, the only way forward I see is that I create a version of LCDproc that is usable enough to do benchmarks for my thesis.

No, the only way forward is to reduce to a set that works. We already decided do exclude validation and as next step we could exclude mmap. I hope it is not necessary as we have someone actively working on mmap. Let us see in #2806.

After that I think we need to at least wait for the new plugin system (needed anyway for a proper server implementation without workarounds).

The changes in the plugin system are only related to the maximum number of plugins. It will not magically fix bugs.

kodebach commented 5 years ago

One reason more to not export this interface to the user.

User as in user of the code-generator or user as in user of the application (e.g. LCDproc user)?

It will only be exported to the former not the latter. And not exporting it to the former means limiting the use cases of the code-generator needlessly.

then you could simply expose only one option (let us say specificationStrategy) and from this option you derive the 3 other options.

That is exactly what I don't want to do, because this is not very transparent. It is also not future proof. If we use the specification for new things, do we add more strategies? Do all the old strategies have the same behaviour or not? With separate options we don't have to think about any of this. It is very clear what the 3 options do. The developer has to decide what is best for their specific use case.

Good, please create issues.

I am sorry but that is not good. It is catastrophic. I created enough issues already to know, they won't be fixed any time soon, unless I do it myself or somebody is actively working on the broken part already.

Every attempt to set a boolean to something else than "1" or "0" is not supported.

That's another problem, because "not supported" actually means "you must not do it, even though it works" in this case. I can't say right now how or when exactly, but there are definitely cases where a kdb set -N user ... succeeds, even if the type plugin should fail.

I do not think that LCDproc people will have problems understanding "1" or "0".

No, but they certainly won't like that the version using an actual configuration framework can't do what the old minimalist handwritten parser could do. They also won't like that all numbers (even those that make no sense in decimal like USB Vendor IDs) have to be in decimal, because hexnumber is disabled since it would be one plugin too many.

Feedback is not nice but essential for the success of the LCDproc elektrification.

But the feedback doesn't help for the problems inside Elektra itself.

No, the only way forward is to reduce to a set that works. We already decided do exclude validation

It was decided that validation is not required by LCDproc people.

But removing validation and conversions is not a good idea. I know @haraldg said he doesn't care about that. But (assuming Elektra's validation actually works) having a spec actually greatly simplifies the code reading the configuration. For example by limiting the range of an integer in the spec you can eliminate the ifs checking the value after reading it.

If we give up on validation through Elektra, I have to put all the code back in.

Apart from that my main point was: Creating a PR in the LCDproc repo and using a mergable/merged version in my thesis, means involving even more people into this discussion. And that means even more delay until I have a version with which I can do benchmarks.

markus2330 commented 5 years ago

It will only be exported to the former not the latter. And not exporting it to the former means limiting the use cases of the code-generator needlessly.

It is definitely not needlessly: we found out that many (most?) combinations do not make sense. It is very much needed to restrict such configurations.

That is exactly what I don't want to do, because this is not very transparent.

It is very transparent if you document the strategies. And this will be needed anyway.

It is also not future proof. If we use the specification for new things, do we add more strategies?

It is unlikely that anyone will want a further strategy in the near future. Harald basically already wants all of them that make sense. It is much more likely, that some of the strategies will rarely needed and bring maintenance headaches.

Do all the old strategies have the same behaviour or not? With separate options we don't have to think about any of this. It is very clear what the 3 options do. The developer has to decide what is best for their specific use case.

So you suggest: let us push the responsibility to someone else?

I am completely happy with two strategies: the one you implemented up to now and another one to make Harald happy (minimize binary size).

I am not happy with the thinking "ohh, we are so flexible, so it must be the users fault when something is messed up".

I am sorry but that is not good. It is catastrophic. I created enough issues already to know, they won't be fixed any time soon, unless I do it myself or somebody is actively working on the broken part already.

Of course it is good that you find it. It will be fixed by the person who implements validation (nobody is working on this right now). Validation is a no-goal now. You can be assured it is a long term goal for me.

That's another problem, because "not supported" actually means "you must not do it, even though it works" in this case.

Of course, that is the status-quo of nearly all FLOSS apps out there. If you mess up the config it might not work, might crash, might format the hard-drive. Elektra is there to change that but one step at a time. We at least guarantee that we do not crash or format the hard-drive. We could give a better error message thus please report the bug.

No, but they certainly won't like that the version using an actual configuration framework can't do what the old minimalist handwritten parser could do. They also won't like that all numbers (even those that make no sense in decimal like USB Vendor IDs) have to be in decimal, because hexnumber is disabled since it would be one plugin too many.

This is your opinion, which I value very much. And you do your best that everything works very good in your opinion. But we are also interested in the LCDproc people's opinions. At the moment we did not ask them what they actually need, which is not a good situation.

But the feedback doesn't help for the problems inside Elektra itself.

Of course it helps: we'll see to it that Elektra evolves. But without feedback we might run into wrong directions. Especially the validation and transformation problems are very domain-specific.

But (assuming Elektra's validation actually works) having a spec actually greatly simplifies the code reading the configuration. For example by limiting the range of an integer in the spec you can eliminate the ifs checking the value after reading it.

Exactly!

If we give up on validation through Elektra, I have to put all the code back in.

No, we simply assume everything is validated for now so that you can finish your thesis. In later work we will of course fix the problems.

Apart from that my main point was: Creating a PR in the LCDproc repo and using a mergable/merged version in my thesis, means involving even more people into this discussion.

Ok, this is what it is all about: Do not worry about that. Your grade on your thesis will definitely not depend on whether the LCDproc people like it or not. I like your work except when I say I don't.

And that means even more delay until I have a version with which I can do benchmarks.

For your thesis you can pick any version you like. It will not be the latest version, this never works out so do not try to do that. Some automation for benchmarking might be nevertheless useful, sometimes it is necessary to repeat everything, even without changes in the code.

kodebach commented 5 years ago

we found out that many (most?) combinations do not make sense

1 out of 4 is neither many nor most.

It is very much needed to restrict such configurations.

Why? What is the benefit of disallowing certain configurations? If somebody has a use-case we didn't think of why prohibit it?

It is much more likely, that some of the strategies will rarely needed and bring maintenance headaches.

Exactly why strategies are a bad idea. Separate options are much easier to maintain, since each of them affects much less of the code. That is the basic principle of "separation of concerns".

and another one to make Harald happy

So if X comes along and wants a performance optimized version we add another strategy? And if Y wants some other version we add that as well?

ohh, we are so flexible, so it must be the users fault when something is messed up

Of course not. But your version is basically "if you do want we say it works; if that doesn't fit your use case that's not our fault". Also I can't see how anything could be messed up here at all. There are different prerequisites for the different combinations to work, but that is still the case if everything is controlled by one option and there are less possibilities. We already said that there is no magic solution that just works.

markus2330 commented 5 years ago

Why? What is the benefit of disallowing certain configurations? If somebody has a use-case we didn't think of why prohibit it?

https://cseweb.ucsd.edu/~tixu/papers/fse15.pdf

Exactly why strategies are a bad idea. Separate options are much easier to maintain, since each of them affects much less of the code. That is the basic principle of "separation of concerns".

https://en.wikipedia.org/wiki/Separation_of_concerns

Hint: the term is about code and not (configuration) data.

So if X comes along and wants a performance optimized version we add another strategy? And if Y wants some other version we add that as well?

This is how software development works. You add support for another use case you originally did not think of if somebody needs it. Then it makes sense to make it configurable so that for others their old setup still works (so in this case: adding another strategy). As you know software development is not: "let us add random stuff in the case someone might need it". Unfortunately, at the moment it is not widely known that the same is also true for configuration.

kodebach commented 5 years ago

https://cseweb.ucsd.edu/~tixu/papers/fse15.pdf

The paper talks about applications with hundreds of options. We are talking about the different between having one parameter with 2-3 values and two parameters with 2 values each.

I don't want to go into further discussion (IMO the paper supports my version at least as well as yours), so what strategies do you propose?

I think we at least need

Spec external, defaults embedded: allows starting without mounting
Spec external, defaults not embedded: minimizes binary size

Possibly also:

Spec embedded, defaults embedded: allows starting without mounting, everything is contained in one file -> version conflicts are harder

And at that point it doesn't really matter that there is a fourth setting, if it makes the documentation easier to understand. In the end the most important thing is that users of the code-generator understand the options.

Maybe this is just me, but if each option just controls a single thing (one for spec, one for defaults) that is easier than a single option controlling both.

the term is about code and not (configuration) data.

Yes and how the code-generator works is about code as well. It has nothing do with configuration or data.

This is how software development works. [...]

It is about the balance between fulfilling exactly the use cases you have right now and making a piece of software that is generally useful. Elektra wants to be useful in general not for one specific case (e.g. embedded system, or server systems). Therefore we should have some flexibility from the start. More importantly software should always be designed in a future proof way. Too many limitations and future changes may become very difficult (e.g. Elektra's limit on the number of plugins is, so deeply ingrained in Elektra that is hard to change now).

markus2330 commented 5 years ago

The paper talks about applications with hundreds of options.

In total Elektra has hundreds of options.

so what strategies do you propose?

I am okay with all strategies:

that we have a good use case for
are tested
are documented

In the end the most important thing is that users of the code-generator understand the options.

This is is not the only point, see above. E.g. see spec, list or csvstorage. They have only understandable options but still they are hardly usable as it is so easy to get these options wrong because most options are usually not needed and depend on each other in surprising ways.

Elektra's limit on the number of plugins is, so deeply ingrained in Elektra that is hard to change now

Yes, it was wrong to limit the number of plugins.

haraldg commented 5 years ago

Yes, enabling gopts in cmake and recompiling libelektra fixed the segfault. Thanks for your help. Wouldn't have figured that out on my own easily.

But removing validation and conversions is not a good idea. I know @haraldg said he doesn't care about that. But (assuming Elektra's validation actually works) having a spec actually greatly simplifies the code reading the configuration. For example by limiting the range of an integer in the spec you can eliminate the ifs checking the value after reading it.

Yes, the number of lines removed from LCDproc was one of the benchmarks we decided upon. However currently LCDproc is very inconsistent with how it checks or fails when there is a configuration error. We don't put much thought into it, ergo it is not important.

If we give up on validation through Elektra, I have to put all the code back in.

No, please don't. As Markus said: Just assume libelektra will be fixed in time. I can't merge anything before it does anyway.

Apart from that my main point was: Creating a PR in the LCDproc repo and using a mergable/merged version in my thesis, means involving even more people into this discussion.

I doubt people will be adding much to the discussion unless they have a really strong opinion on something. OTOH I believe it would help a lot for the eventual acceptance of elektra, if people feel involved a bit. And getting feedback of the kind of "this is never going to work for my use case" is something we really want to get early, if it exists.

However if you feel the current state of things is too embarrassing to show to the public, then this is your call to make. I respect that. But I still hope that the issues with number of plugins etc mostly affect LCDd and we can show at least some reasonably working client around soon.

It need not be a full PR and it even need not be in the LCDproc repo (though why not?). But having some branch of libelektra and some branch of your LCDproc fork, that are confirmed working together and won't be changed during your ongoing development work, would help interested parties a lot.

And that means even more delay until I have a version with which I can do benchmarks.

Out of curiosity: What benchmarks do you intend to do?

About this whole "which way to present code generator options to users" discussion: I think this is a very minor UI concern, that really isn't important. What ever you decide there will be much less anoying for the casual developer, then other things like Elektra inventing its own data types for example.

Personally I'd treat this like gcc treats optimization options: All the world knows that they can chose beween Og, Os, O2 and O3, but all the optimizations are still documented individually and can be selected on the command line if you really want to. This seems a good compromise between usability, precise specification of behaviour and interface stability - nobody expects all the optimization features to stay stable between gcc releases, but of course all the world relies on the meaning of Os and so on staying the same.

So maybe there really should be all the technical options, that @kodebach wants, but also a stategy option like @markus2330 believes is more usable and would actually be the stable interface.

markus2330 commented 5 years ago

Yes, enabling gopts in cmake and recompiling libelektra fixed the segfault. Thanks for your help. Wouldn't have figured that out on my own easily.

Good that this is fixed. Did you somehow compiled in an unclean state or is this something we need to fix? In any case, it should segfault but report that some library is not missing, shouldn't it?

I doubt people will be adding much to the discussion unless they have a really strong opinion on something.

Exactly. And config formats often is something, people have very strong opinions about.

OTOH I believe it would help a lot for the eventual acceptance of elektra, if people feel involved a bit.

I think so, too. This was also very clear from the previous email about Elektra to the mailing list.

About this whole "which way to present code generator options to users" discussion: I think this is a very minor UI concern, that really isn't important.

Of course every individual unnecessary option is not important but once you have hundreds it becomes a big problem.

What ever you decide there will be much less anoying for the casual developer, then other things like Elektra inventing its own data types for example.

From type theory point of view Elektra does not invent any new types. (In C struct makes new types but typedef is only an alias to existing types). Practically, typedefs can be annoying. If you want, we can rename all types in LCDproc to the C99 types. (Maybe in a later stage.) As on C99 platforms we hopefully always pick the C99 types (if not it would be a bug).

This seems a good compromise between usability

If you need the low-level options to play around with optimizations we can of course offer them. But a big warning: as I understand them they do not only tune performance but also sacrifice correctness or introduce different behavior, something most gcc optimization options don't do.

haraldg commented 5 years ago

markus2330 writes:

Good that this is fixed. Did you somehow compiled in an unclean state or is this something we need to fix?

I compiled elektra from a fresh git clone a few hours before @kodebach wrote his instructions. So instead of copying his command lines, I just did a plane cmake ...; make; make install sequence.

It seems the gopts isn't built by default yet. Whether this is intentional or an error is up to you.

Practically, typedefs can be annoying.

They make the code harder to read for casual users of elektra.

If you want, we can rename all types in LCDproc to the C99 types.

Yes, I'd prefer that non-standard typedefs leak into LCDproc code as little as possible.

In this particular case the ship probably has already sailed, but I believe using non-standard types in the Elektra API is a poor choice too: It just confuses people.

If you need the low-level options to play around with optimizations we can of course offer them. But a big warning: as I understand them they do not only tune performance but also sacrifice correctness or introduce different behavior, something most gcc optimization options don't do.

I certainly don't need them, but I might find them nice to play around with. However my main point was: The points raised by you and @kodebach both have their merits, so it might make sense to follow both: You get your stable interface and @kodebach gets his options with clearly defined behavior.

Yes, that code generator options will result in different behaviour (at least on error paths) is clear. gcc options are just an analogon, not completely equivalent.

markus2330 commented 5 years ago

It seems the gopts isn't built by default yet. Whether this is intentional or an error is up to you.

It will be included by default once it is not experimental anymore.

They make the code harder to read for casual users of elektra.

Yes, I agree. Is it okay for you to have something like int32_t x = elektraGetLong(....

using non-standard types in the Elektra API is a poor choice too: It just confuses people.

In the low-level API we do not have this problem, as the API user needs to do the conversion.

In the internal-notification API we actually have both: C types and an API for our typedefs.

If we change to C99 types, we should do this consistently in both internal-notification and highlevel. Would be quite an effort but it could be done without breaking compatibility (as typedefs are different names for the same and as such they must be compatible).

Another question is if the high-level API (and the internal-notification) is C-only or also for other languages. Iirc @kodebach also had in mind that the high-level API should be easy to write bindings for.

haraldg commented 5 years ago

Is it okay for you to have something like int32_t x = elektraGetLong(....

Sure. As long as it is clear on first glance what is going on, it's fine.

kodebach commented 5 years ago

Is it okay for you to have something like int32_t x = elektraGetLong(....

Sure. As long as it is clear on first glance what is going on, it's fine.

Switching from the kdb_*_t types to C99 types should just be a series of regex replaces.

If we change to C99 types, we should do this consistently in both internal-notification and highlevel. [...]

Under C99+ the change should be quite simple, but at least parts of Elektra work without full C99 as well. Thats why kdbtypes.h is so complicated in the first place.

If we require C99+ for all of Elektra, we should also switch to using C99 types IMO.

Another question is if the high-level API (and the internal-notification) is C-only or also for other languages. Iirc @kodebach also had in mind that the high-level API should be easy to write bindings for.

Bindings don't really play a role here. Other languages have different types anyways. If anything bindings would be a point in favor of using C99 types, because those might be automatically mapped to standard language types.

markus2330 commented 5 years ago

Thats why kdbtypes.h is so complicated in the first place.

Yes.

@kodebach Did you try if it actually works? (With a compiler that has no understanding of C99 like VisualC++?)

If it does not work, we should definitely get rid of all the complications there and simply use (only) C99 types.

Under C99+ the change should be quite simple, but at least parts of Elektra work without full C99 as well.

Yes, compiling Elektra needs C99 for a long time but compiling Software using Elektra (at the moment) might still work with pre-C99 (if all the #ifs actually work for your system).

From communication point of view it we definitely should start saying "long" is a signed 32 bit integer and stop pointing to the CORBA standard (it only scares people).

At least internally, "src/include/kdbtypes.h" makes sense anyway (for checking the availability and to define the printf specifier). But I see that applications do not want to use this file.

Btw. We already have an open issue about unifying the type system: #1092

markus2330 commented 4 years ago

Thank you for the discussion, please open new issues for further discussions.