bastikr / boolean.py

Implements boolean algebra in one module.
BSD 2-Clause "Simplified" License
78 stars 34 forks source link

#33 Do not use __new__ anymore #41

Closed pombredanne closed 8 years ago

pombredanne commented 8 years ago

You cannot pass an expression or a string to the Expression constructor anymore. Instead you build an Expression (which is really akin to an algebra definition) and you can build expression with the configured operations, symbol, base elements, etc. Or invoke parse on it to parse from a string.

Creating an expression no longer invoke simplify() by default: it needs to be called explicitly, though some method still retain a simplify arg for convenience.

You can configure an Expression by providing alternative classes for AND,OR,NOT,TRUE,FALSE,Symbol and the tokenizer function. In particular you can subclass and customize the way expressions are printed or represented.This supports #36

parse, normalize, symbols (now build_symbols) are methods of the Expression object. parse now uses the symbol class of the configured expression.

Several attributes shielded by a property are now just simple attributes. TRUE and FALSE are no longer singletons: instead you can instantiate them as needed.

A new .pretty() method is available and return a pretty formatted representation of an Expression.

Some internal attributes or methods were renamed for clarity: -_cls_order is now sort_order.

Kronuz commented 8 years ago

NotImplemented should be thrown instead of returned... That also simplifies some things as to not need to check if the returned value is NotImplemented.

So Expression serves the purpose of BooleanAlgebra? I'd still like it better if it was separated, I think. Perhaps a BooleanAlgebra instance could be passed to Expression instead. There's still something about it I'm not entirely convinced for some reason.

Finally, the thing about TRUE not being a singleton... because you can't compare TRUE() is TRUE() and probably neither the result of a simplified expression against TRUE() or FALSE().

On something else... I guess with sub() you don't really need/want Symbol to have values. That was the other source of the problem I had when I tried to add another layer to change symbols values instead of using sub() to simplify my expressions. I'd vote for removing that.

pombredanne commented 8 years ago

@Kronuz you wrote:

NotImplemented should be thrown instead of returned... That also simplifies some things as to not need to check if the returned value is NotImplemented.

Agreed. I did not touch that part yet, I just carried it over.

pombredanne commented 8 years ago

@Kronuz you wrote:

So Expression serves the purpose of BooleanAlgebra? I'd still like it better if it was separated, I think. Perhaps a BooleanAlgebra instance could be passed to Expression instead. There's still something about it I'm not entirely convinced for some reason.

In that first pass yes, and I am not happy about that. We should instead inject what is needed in each class and only that. I have a simple solution in mind. Let me push that

pombredanne commented 8 years ago

@Kronuz you wrote:

Finally, the thing about TRUE not being a singleton... because you can't compare TRUE() is TRUE() and probably neither the result of a simplified expression against TRUE() or FALSE().

You can check for equality instead of identity for the same result. Alternatively, we could have a mode where TRUE and FALSE instances are unique and identical within the confines of an algebra. But would still not be global singletons.

pombredanne commented 8 years ago

@Kronuz you wrote:

On something else... I guess with sub() you don't really need/want Symbol to have values. That was the other source of the problem I had when I tried to add another layer to change symbols values instead of using sub() to simplify my expressions. I'd vote for removing that.

Can you elaborate a bit? I am not just I grok the point.

Kronuz commented 8 years ago

@pombredanne you wrote:

You can check for equality instead of identity for the same result. Alternatively, we could have a mode where TRUE and FALSE instances are unique and identical within the confines of an algebra. But would still not be global singletons.

About TRUE and FALSE singletons, I do believe they should be singletons, I agree regarding those not being global singletons, but BooleanAlgebra singletons, those could be class properties of BooleanAlgebra.

@pombredanne you wrote:

Can you elaborate a bit? I am not just I grok the point.

I saw a comment in the code, where it said something about values in the Symbol instance: # FIXME: this statement is weird: Symbols do nor have a value assigned?? If that is the case, I believe they shouldn't, that's all.

pombredanne commented 8 years ago

I saw a comment in the code, where it said something about values in the Symbol instance: # FIXME: this statement is weird: Symbols do nor have a value assigned? ? If that is the case, I believe they shouldn't, that's all.

OK, got it. I did add this comment because there is nothing anywhere else about the "value" of Symbol so I was a bit puzzled by the doc string. I assume @bastikr meant for these to possibly have a a TRUE or FALSE value assigned at some point? The way things work today, I think a Symbol is always considered as "TRUE"

pombredanne commented 8 years ago

@Kronuz you wrote:

About TRUE and FALSE singletons, I do believe they should be singletons, I agree regarding those not being global singletons, but BooleanAlgebra singletons, those could be class properties of BooleanAlgebra.

I do not like singletons in general, but if there is ever a use case for them this is it indeed. I will make these globals singletons again as this is the cleanest and simplest way. It never makes sense to subclass these in any case. So globals is best IMHO.

Kronuz commented 8 years ago

I think they should be allowed to be subclassed, that might be needed by someone at some point, but the instance could be created once the algebra is created and use those two instances from then on for everything.

Kronuz commented 8 years ago

Maybe we could pass the TRUE and FALSE classes to the algebra and instead of adding the class to the object, add an instance of those; in a way where instead of doing self.TRUE(), we'd only do self.TRUE, using the already instantiated single TRUE object.

Kronuz commented 8 years ago

But anyway, I still don't get how objects will share the Boolean algebra, one would need to pass the same classes as parameters to Symbol and And, Or, etc? I still think BooleanAlgebra class should exist, and at least during simplify(), an instance of it should somehow be passed to all objects as needed, so those share the singletons and other properties of it.

And instead of passing the classes to its __init__, I'd make them class properties the user would have to switch, had she wanted to change any of base classes.

Also, the tokenizer should be a method of the BooleanAlgebra as well, and if the user wants to change the parsing or the tokenizer, it could subclass BooleanAlgebra and overload the methods and properties as needed.

pombredanne commented 8 years ago

@Kronuz I do not think there is ever a use case to override the behaviour of TRUE or FALSE? And for sanity and for the same reasons True and False cannot be subclassed in Python I think it is cleanest to keep them this way.
Here is what Guido says on this topic https://mail.python.org/pipermail/python-dev/2002-March/020822.html

You can always implement the latter as a subtype of bool if you care enough and without breaking code.

I thought about this last night, and realized that you shouldn't be allowed to subclass bool at all! A subclass would only be useful when it has instances, but the mere existance of an instance of a subclass of bool would break the invariant that True and False are the only instances of bool! (An instance of a subclass of C is also an instance of C.) I think it's important not to provide a backdoor to create additional bool instances, so I think bool should not be subclassable.

pombredanne commented 8 years ago

@Kronuz your wrote:

But anyway, I still don't get how objects will share the Boolean algebra, one would need to pass the same classes as parameters to Symbol and And, Or, etc? I still think BooleanAlgebra class should exist, and at least during simplify(), an instance of it should somehow be passed to all objects as needed, so those share the singletons and other properties of it.

We are thinking alike: algebra is coming next. In the end passing around NOT AND OR types does not work though since each references the other. So there are two ways around this:

  1. currying the initializers in the algebra and enforcing that the correct types are used everywhere using their preconfigured NOT AND OR
  2. making these class attributes configured by the algebra: in this case it would not be possible to use multiple algebra definitions and this would not be thread safe, but that is quite OK IMHO and much simpler

And instead of passing the classes to its init, I'd make them class properties the user would have to switch, had she wanted to change any of base classes.

Which is what I suggested above in 2. We are on the same page.

Also, the tokenizer should be a method of the BooleanAlgebra as well, and if the user wants to change the parsing or the tokenizer, it could subclass BooleanAlgebra and overload the methods and properties as needed.

Excellent point.

pombredanne commented 8 years ago

The latest commit introduces the BooleanAlgebra class. There is still something fishy about the cycles and cross-references in the classes that is not right: Expressions still all self references every other subtypes.... I guess this is not avoidable at some level but this means that short of currying the types, this cannot be really correct, not fully subclassable.

Now is subclassing really needed for anything else by the BooleanAlgebra and the Symbol? Probably not except for stringification and representation which leads to the topic of #40 and #36 IMHO if we agree on a standard string and representation for NOT AND OR rather than using + and * subclassing may not be needed in most cases. And we could have a BoolenAlgebra method that can produce alternate representations and stringifications too.

pombredanne commented 8 years ago

So the original implementation could not be subclassed. This is marginally better and I am still not quite sure that AND/OR/NOT can subclass entirely correctly.

pombredanne commented 8 years ago

@Kronuz you wrote:

Maybe we could pass the TRUE and FALSE classes to the algebra and instead of adding the class to the object, add an instance of those; in a way where instead of doing self.TRUE(), we'd only do self.TRUE, using the already instantiated single TRUE object.

After all you might be right. Let me take a crack at it

pombredanne commented 8 years ago

@Kronuz @bastikr I would like to get your feedback at this stage. I am going to push a pre-release version to Pypi to ease testing and review

pombredanne commented 8 years ago

as discussed I pushed a .dev pre-release for testing here: https://pypi.python.org/pypi/boolean.py/2.0.dev1 Since this is .dev release it is not picked by pip by default.

pombredanne commented 8 years ago

I have been thinking about something to be safe when using multiple algebra in the same process some of which may be using some of the same types. Here is the problem: This new design allow subclassing but at the same time uses class attributes added at the time the algebra is constructed. This means that you could create a first algebra with a custom OR and another algebra with another custom OR: the attribute types of AND/NOT for OR on the first algebra would be overwritten with the type of OR in the second algebra. Which would lead to byzantine behaviour. As a solution and quite simply, the *_class provided in the BooleanAlgebra constructor could be dynamically subclassed in the initializer, using the same names as their base class. And the cyclic/cross-referencing attributes only added to these dynamic subclasses. As a result, multiple algebra could be created without clashing since the classes used in an algebra would always be new and privately created sub types. Essentially this would make the BooleanAlgebra a type factory. Some sort of meta meta class? but still allowing subclassing all elements and operations of the algebra for customization. I will take a crack at this and add additional tests.

pombredanne commented 8 years ago

The latest commit uses now a private subclass wrapper using the type() builtin. It works nicely So let's recap what we have gained here:

  1. flexible subclassing of all algebra elements and functions for customization, including when using multiple algebra in the same process.
  2. same for parsing and tokenization + structure error messages
  3. some side nice to have: cnf/dnf and a pretty tree repr for expressions.
  4. some simplification of the code and no new statements anymore
  5. TRUE and FALSE are algebra singletons, not globals.

What we lost:

  1. you cannot instantiate expressions directly. You need to create an algebra first and used the configured types
  2. no more support for anonymous symbols, 0 and 1 expressions: These can be created alright as regular symbols
  3. no implicit simplify. It must be called explicitly
pombredanne commented 8 years ago

@Kronuz you wrote:

NotImplemented should be thrown instead of returned... That also simplifies some things as to not need to check if the returned value is NotImplemented.

the way it is used in the intended Python way: NotImplemented is a singleton (one more) and designed for the purpose of expressing indetermination in rich comparisons. Now the code can surely be streamlined there, but @bastikr use this in the correct way.

pombredanne commented 8 years ago

@Kronuz @bastikr unless you have an objection, I will merge this in master over the week end and call this a 2.0.0

pombredanne commented 8 years ago

@Kronuz @bastikr I assume that there are no objection... So I will merge and release 2.0. I would have preferred a +1 from each of you though!

Kronuz commented 8 years ago

Ok, here are my suggestions:

Instead of using _TRUE and _FALSE class names, drop the underscore. It's not needed anyway, as it shouldn't be used directly.

Names for TRUE and FALSE instances within the algebra could be lowercase.

Name for symbol class in the algebra could be consistent with the class name Symbol, with an uppercase "S", after all, it's still a variable which contains a class.

Instead of having all those objects set up (TRUE, AND, symbol, etc.) in the Expression subclasses, it could instead just have a member algebra object, through which such would be accessible.

Kronuz commented 8 years ago

There could be a global default algebra singleton within the boolean module, which would be used everywhere if no other algebra is defined (i.e. Expression.algebra would be a class member instance of such singleton, and an instance member algebra would be set if another algebra is used)

Kronuz commented 8 years ago

Instead of passing classes to BooleanAlgebra during __init__(), define those class objects as class members of BooleanAlgebra

pombredanne commented 8 years ago

@Kronuz you wrote:

Instead of using _TRUE and _FALSE class names, drop the underscore. It's not needed anyway, as it shouldn't be used directly. Names for TRUE and FALSE instances within the algebra could be lowercase.

To me, this would become eventually a source of visual confusion with True and False and their representation. I kinda like it being all uppercase as it signals this is a special object, e.g a constant And using _TRUE for the type signals this is a special type. It also avoid making a confusion between the type and the constant so it makes things easier to read IMHO.

That said, AND etc class names could be renamed to And, Or, Not for clarity: these are not constants.

Name for symbol class in the algebra could be consistent with the class name Symbol, with an uppercase "S", after all, it's still a variable which contains a class.

Agreed.

Instead of having all those objects set up (TRUE, AND, symbol, etc.) in the Expression subclasses, it could instead just have a member algebra object, through which such would be accessible.

What is the benefit of introducing an extra level of nesting? as opposed to a simpler flat layout?

pombredanne commented 8 years ago

@Kronuz you wrote:

There could be a global default algebra singleton within the boolean module, which would be used everywhere if no other algebra is defined (i.e. Expression.algebra would be a class member instance of such singleton, and an instance member algebra would be set if another algebra is used)

This introduces a level of complexity for which purpose or benefits? Also this would likely be impractical as this would introduce potentially some infinite cycles to the model? Also if you do a global with class attributes I think you could no longer be able to subclass cleanly any more. Expression is not wrapped in a type when instantiating an algebra. Only the leaf subclasses of the inheritance hierarchy are wrapped.

pombredanne commented 8 years ago

@Kronuz you wrote:

Instead of passing classes to BooleanAlgebra during __init__(), define those class objects as class members of BooleanAlgebra

What would be the benefits of this?

pombredanne commented 8 years ago

Unrelated, it just dawned on me that using an order attribute for Functions is no longer needed now that each is instantiated with its own initializer.

pombredanne commented 8 years ago

all merged in master. I also pushed a v2.0 to Pypi.