ComPWA / tensorwaves

Python fitter package for multiple computational back-ends
https://tensorwaves.rtfd.io
Apache License 2.0
10 stars 4 forks source link

Make it easier to inspect source-code of a BackendFunction #323

Closed redeboer closed 2 years ago

redeboer commented 2 years ago

The lambdified function that is contained in LambdifiedFunction is 'hidden' by a dunder:

https://github.com/ComPWA/tensorwaves/blob/942f4b5b04f7e2c1d58b0ef09ba152e136e4db48/src/tensorwaves/model.py#L278

This makes it almost impossible to get the lambdified function or inspect the source code of that lambdified function. Note that simply making this attribute public does not entirely solve the problem: the lambdified function can be constructed by optimized_lambdified, meaning that it is still not possible to inspect the lambdified function even if you have access to this attribute.

See also #322

jonas-eschle commented 2 years ago

I would suggest to make this (and the others) simply normal private attributes. Name mangling is usually just needed if there is a potential for naming conflicts, which I think is not given here?

Or, out of curiosity, why was this even been put to leading dunder?

redeboer commented 2 years ago

Thanks for your remarks!

I would suggest to make this (and the others) simply normal private attributes. Name mangling is usually just needed if there is a potential for naming conflicts, which I think is not given here?

For sure, one thing that can easily be done is to use a single underscore instead of a dunder, also in other classes. That at least makes it easier to play around with these 'hidden' attributes.

Or, out of curiosity, why was this even been put to leading dunder?

The dunder krept in earlier on while we (@spflueger) were still getting familiar with Python (expertsystem initially) and still had this private/public idea of C++ in mind. We preferred dunder when attributes are supposed to be implementation only ('don't touch this') and single underscore if we only want to hide an attribute from the API, though it's hard to make a clear distinction.

I also vaguely remember that Pyright (which we started to use to detect dead code) reports unused classes/functions/variables differently depending on whether they are dunder or single underscore. For instance, dunder instances were reported if they weren't used in the same file/module, because it's (almost) impossible to call them elsewhere in the framework.


All in all, this issue probably concerns two distinct things:

  1. How about getting rid of all the leading dunders?
  2. Can a callable returned by optimized_lambdify be stored in such a way that the lambdified functions inside it can also be inspected?
spflueger commented 2 years ago

I would ask the other way around: what's wrong with using a dunder by default? I think its quite similar to c++ private and protected. There you have the same ideal, to use private unless you really need protected or public. So my logic would be the same here, unless you need access to the variable make it a dunder. So far I don't have any argument why using dunders by default is a bad thing.

jonas-eschle commented 2 years ago

I would ask the other way around: what's wrong with using a dunder by default? I think its quite similar to c++ private and protected.

Besides the Python convention, from a software design perspective: why private and not protected? If I would like to inherit from the class, should I not be able to access exactly these attributes? Like should not every LambdifiedFunction(in the OOP "is-a" sense) have this lambdified function accessible?

AFAIU, a single leading underscore is considered "private/protected", while a dunder is "just" to avoid collisions. So if you wanna do some real heavy magic or if the attribute is likely to collide with another attribute, that's a good use-case for dunder.

There you have the same ideal, to use private unless you really need protected or public.

Python goes here more the path of "we're all adults here". There is no need to be over-protective with things, but rather about "communicating an intention". If anyone touches a private (_) attribute, the convention is clear: that is not guaranteed to be stable and may break anytime. We all know that. Making something __ just makes sure that the attribute is not accessed by accident. Because all attributes can be accessed anyways. So using a dunder here conveis I think the wrong message:

From another developers perspective, I would find it rather confusing that an "obvious" attribute such as the lambdified function is "so protected". Is there a lot of magic going on that will break everything if touched? Same for the parameters: this is just the usual private attribute, saying: "please use the public accessor", so why is that mangled?

I would say: single underscore means private, implementation detail. May changes. Don't rely on it. Dunder means, don't touch because you cannot understand this attribute, like really (or it is likely to collide). Looking at codebases, this seems also like it's handled in general. But I agree that it leaves a bit of unease sometimes as to the missing strictness of what is what.

We had in the Python for HEP group at some point a long discussion about how to convey the message to a user that he is allowed to overwrite an "internal" method, which somewhat also came from this missing distinction of private/protected. The solution was that a) either state that an "unstable" method (leading underscore) is actually stable or use (a "yet" quite unknown "convention") _method_(...).

spflueger commented 2 years ago

Well I know that in all "non adult" programming languages, protected should only be used if absolutely needed, or if the designer knows what he is doing. The problem being that a protected variable is accessible by the whole inheritance tree, which is a nightmare for making sure some variable only changes through some specific way. So it just leads to more bad code. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rh-protected

I know python is never truly safe and this mechanism was created to avoid name collisions, but I still would try to make it as hard as possible for a developer to screw something up. Basically this tells us that the current design might have a flaw and it should be reconsidered.

Now regarding this specific issue, I don't really understand what the use case of inspecting the lambdified function is. Looks more like for debugging purposes. @redeboer can you elaborate why this is needed, or what you ultimately want to do?

redeboer commented 2 years ago

Just some thoughts while reading through: so far, inheritance in TensorWaves was only used for defining (abstract) interfaces. This is done on purpose: no code sharing through inheritance, even if that seems convenient. In that sense, the dunder was never to avoid name conflicts (there are no shared attributes), but to hide implementation, discourage further inheritance, and signal that the derived classes only fulfil the contract that their interface base class prescribes.

Now regarding this specific issue, I don't really understand what the use case of inspecting the lambdified function is. Looks more like for debugging purposes. @redeboer can you elaborate why this is needed, or what you ultimately want to do?

Yes, you are correct that it is mostly for debugging. I just wanted to call inspect.getsource on the lambdified function inside. You then face two problems: (1) that the __lambdified_model is hard to access (leading to the discussion above ;)) and (2) that the __lambdified_model can be a function that calls (inaccessible) inline functions. But indeed, the general issue here is that it would be nice to have some mechanism that allows you to inspect the source code of the lambdified function (whether it's one function, or a function that calls other lambdified functions).

jonas-eschle commented 2 years ago

Sure, absolutely agree! And while I like the Python philosophy, it requires a lot of awareness and assumes that everyone understands the "rules", not sure that this requirement is always satisfied in scientific code... :P

...as hard as possible for a developer to screw something up.

I think this is where the Python philosophy (and the "adult" thing) differs from others: there is no need to make it hard. It is already clear that underscore should not be used: the difference between a "do not enter" sign and a locked door.

I agree, debugging is a prominent use-case (I also had this with the model, I remember, and accessing/hacking around internals was difficult). To add, it's also very useful for testing: tests can be even more specific and test the internals properly.

spflueger commented 2 years ago

But if one is interested in the lambdified expression it can be generated by model.lambdify(backend=backend) right?

redeboer commented 2 years ago

That's right, but in that case, there is the second problem: it returns a callable with inline functions. May be fixed through #322.