cloudtools / troposphere

troposphere - Python library to create AWS CloudFormation descriptions
BSD 2-Clause "Simplified" License
4.92k stars 1.45k forks source link

TypeError: unhashable type: 'Parameter' #2196

Closed JohnPreston closed 8 months ago

JohnPreston commented 8 months ago

Hello,

This might be something that only affects me, but I have been relying for 3 years now and built my application to rely on the Parameters as keys to dict. In the 4.5.0, I am getting the error

TypeError: unhashable type: 'Parameter' and basically now, the entire thing is broken. Seems to me the change is with, but I am not entirely sure.

    def __hash__(self) -> int:
        return hash(self.to_json(indent=0))

What to do now?

EDIT:

Here is an example among many of how I am able to use it in dict: https://github.com/compose-x/ecs_composex/blob/main/ecs_composex/dynamodb/dynamodb_stack.py#L38

markpeek commented 8 months ago

@JohnPreston sorry to hear the latest release caused issues with your application. The likely PR is #2182 but would like to find out some more information.

Adding @ITProKyle for any insights based on the references PR.

markpeek commented 8 months ago

Looking at the above code, I believe this is a simple repro that works on 4.4.1 but causes the error on 4.5.0:

$ cat param_hash.py
from troposphere import Parameter

param = Parameter("param")
d = {
    param: "foo",
}
$ python param_hash.py
Traceback (most recent call last):
  File "/troposphere/param_hash.py", line 4, in <module>
    d = {
        ^
TypeError: unhashable type: 'Parameter'
ITProKyle commented 8 months ago

So, it's not caused by my additions of __hash__ because none of those are for ancestors of Parameter. It's actually caused by the addition of __eq__ without also adding __hash__.

This is going to be a little tricky. Starting with how python defines something as being hashable:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

Most of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not; immutable containers (such as tuples and frozensets) are only hashable if their elements are hashable. Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().

So from the above, while instances of user-defined classes are hashable by default (because they arn't comparable) Parameters/anything based on BaseAWSObject really shouldn't be hashable because they are mutable.


If the above is disregarded entity, by adding __hash__ to BaseAWSObject like so:

class BaseAWSObject:

    def __hash__(self) -> int:
        return super().__hash__()

But, this goes against pythons definition for something being hashable (in addition to being mutable) in that "Hashable objects which compare equal must have the same hash value".

"""Example of resulting change."""
from troposphere import Parameter

param0 = Parameter("param0", Type="String")
param1 = Parameter("param1", Type="String")
param1.title = "param0"
some_dict = {param0: "foo", param1: "bar"}
print(some_dict)
# stdout: {<troposphere.Parameter object at 0x10514ba90>: 'foo', <troposphere.Parameter object at 0x10514bfd0>: 'bar'}

In the above example, param0 and param1 would compare equal but their hashes are unique. This results in 2 items being present in the dict.

Another option for implementing __hash__ would be:

class BaseAWSObject:

    def __hash__(self) -> int:
        return hash(json.dumps({"title": self.title, **self.to_dict()}, indent=0))

This would make objects that compare equal have the same hash but goes against "An object is hashable if it has a hash value which never changes during its lifetime".

"""Example of resulting change."""
from troposphere import Parameter

param0 = Parameter("param0", Type="String")
param1 = Parameter("param1", Type="String")
param1.title = "param0"
some_dict = {param0: "foo", param1: "bar"}
print(some_dict)
# stdout: {<troposphere.Parameter object at 0x103203ad0>: 'bar'}

In the above example, param0 and param1 now share the same hash because the two objects would compare as equal. This results in 1 item being present in the dict.


I'll leave it up to you, @markpeek, as to how you would like to proceed given the info.

markpeek commented 8 months ago

@ITProKyle thank you for the thorough writeup and alternatives. I agree there isn't a perfect solution. I'm leaning towards your last suggestion as the least disruptive. This could also be applied to the class Parameter instead of class BaseAWSObject to work around this issue. Not sure if that would be better or worse so open to thoughts from you and @JohnPreston.

class BaseAWSObject:

    def __hash__(self) -> int:
        return hash(json.dumps({"title": self.title, **self.to_dict()}, indent=0))
JohnPreston commented 8 months ago

@JohnPreston sorry to hear the latest release caused issues with your application. The likely PR is #2182 but would like to find out some more information.

That's okay! I think such a change might have warranted a 5.0, but it is how it is and there is always some unforeseen impact.

Python used is 3.9 minimum, mostly 3.10. Yes everything works great in 4.4.1

Re: thoughts -> at this point I am happy to try anything. I have in fact my own Parameter class, inherited from this one, which has a couple other attributes to do things like auto-add info to populate the CFN Metadata.

https://github.com/compose-x/ecs_composex/blob/main/ecs_composex/common/cfn_params.py#L18

So happy to override that method locally and give it a shot and see. If there is anything you'd like me to try specifically please let me know

markpeek commented 8 months ago

@JohnPreston I would say try the above hash method in your Parameter subclass and report back. And then let me know if it should remain in your code or I could put it into troposphere.

JohnPreston commented 8 months ago

I have just tried that. The issue is, now, on other types. I have lots of objects that I do not have all the properties sorted out for until the very last moment because I treat them like objects.

So the change on the __hash__ seems to make it okay for Parameter but now I have things like

    Then I render the docker-compose to composex to validate      # tests/features/steps/common.py:48 0.194s                                                                                                         
      Traceback (most recent call last):                                                                                                                                                                             
        File "/home/john/.cache/pypoetry/virtualenvs/ecs-composex-6XqxN9Zj-py3.10/lib/python3.10/site-packages/behave/model.py", line 1329, in run                                                                   
          match.run(runner.context)                                                                                                                                                                                  
        File "/home/john/.cache/pypoetry/virtualenvs/ecs-composex-6XqxN9Zj-py3.10/lib/python3.10/site-packages/behave/matchers.py", line 98, in run                                                                  
          self.func(context, *args, **kwargs)                                                                                                                                                                        
        File "/home/john/dev/ecs_composex/tests/features/steps/common.py", line 50, in step_impl                                                                                                                     
          context.root_stack = generate_full_template(context.settings)                                                                                                                                              
        File "/home/john/dev/ecs_composex/ecs_composex/ecs_composex.py", line 272, in generate_full_template                                                                                                         
          family.init_network_settings(settings, vpc_stack)                                                                                                                                                          
        File "/home/john/dev/ecs_composex/ecs_composex/ecs/ecs_family/__init__.py", line 434, in init_network_settings                                                                                               
          self.stack.set_vpc_parameters_from_vpc_stack(vpc_stack, settings)                                                                                                                                          
        File "/home/john/dev/ecs_composex/ecs_composex/common/stacks/__init__.py", line 291, in set_vpc_parameters_from_vpc_stack                                                                                    
          and self.parent_stack != settings.root_stack                                                                                                                                                               
        File "/home/john/.cache/pypoetry/virtualenvs/ecs-composex-6XqxN9Zj-py3.10/lib/python3.10/site-packages/troposphere/__init__.py", line 426, in __ne__                                                         
          return not self == other                                                                                                                                                                                   
        File "/home/john/.cache/pypoetry/virtualenvs/ecs-composex-6XqxN9Zj-py3.10/lib/python3.10/site-packages/troposphere/__init__.py", line 420, in __eq__                                                         
          return self.title == other.title and self.to_json() == other.to_json()                                                                                                                                     
        File "/home/john/.cache/pypoetry/virtualenvs/ecs-composex-6XqxN9Zj-py3.10/lib/python3.10/site-packages/troposphere/__init__.py", line 356, in to_json                                                        
          return json.dumps(self.to_dict(), indent=indent, sort_keys=sort_keys)                                                                                                                                      
        File "/home/john/.cache/pypoetry/virtualenvs/ecs-composex-6XqxN9Zj-py3.10/lib/python3.10/site-packages/troposphere/__init__.py", line 340, in to_dict                                                        
          self._validate_props()                                                                          
        File "/home/john/.cache/pypoetry/virtualenvs/ecs-composex-6XqxN9Zj-py3.10/lib/python3.10/site-packages/troposphere/__init__.py", line 416, in _validate_props                                                
          raise ValueError(msg)                                                                           
      ValueError: Resource TemplateURL required in type AWS::CloudFormation::Stack (title: Test)  

Where now the __eq__ is not working either.

markpeek commented 8 months ago

@JohnPreston this PR likely addresses the __eq__ issue: PR #2197

JohnPreston commented 8 months ago

@JohnPreston this PR likely addresses the eq issue: PR #2197

I saw that one, let me use that branch to see how it goes :)

ITProKyle commented 8 months ago

I was using the changes in that PR to produce the results above. I didn't try without those changes.

ITProKyle commented 8 months ago

However, the changes in that PR likely won't have any effect on your error since isinstance(other, self.__class__) is passing which is getting you here (https://github.com/cloudtools/troposphere/blob/4.5.0/troposphere/__init__.py#L419-L420) with your comparison. #2197 addresses comparisons that can't be handled directly.

What are you trying to achieve with your comparison? Prior to 4.5.0, that would only check if they are the exact same object via id (e.g. id(self.parent_stack) != id(settings.root_stack)) rather than comparing the resulting CloudFormation the objects represent.

markpeek commented 8 months ago

Looking a bit closer at that traceback, it's not really a __eq__ specific issue. The implementation creates the dict and then encodes into json. In going through that path, it also does validation on the object which is what you're seeing. It is missing required properties. Perhaps the __eq__ path needs a way to turn off that validation step.

JohnPreston commented 8 months ago

Indeed, I came down to that same conclusion @markpeek Yes, @ITProKyle , the ID here is that this allows me to ensure these aren't the same object in memory (so more by address than actually comparing). Which is what I wanted in this case.

ITProKyle commented 8 months ago

No matter what changes are made here like disabling validation when encoding to json would give you that ability now that comparison is performed on the CloudFormation the objects represent. The only way to get there here would be to revert the feature which actually bring troposphere in parity with awacs which I found to be the case after going over there to replicate my changes.

However, you can perform the same comparison on your end by passing the objects into id() which explicitly performs that comparison.

JohnPreston commented 8 months ago

@ITProKyle I just did that and indeed all my tests pass. Yet to test the actual templates but the change of __hash__ and using the id() to compare, which I suppose is clearer and syntactically better (also that makes me happy to be a little bit more C like with this^^) is good.

So given the https://github.com/cloudtools/troposphere/pull/2197 changes, what do we think about the change for __hash__ ?

EDIT: Spoke too quick, not all my tests pass, but I will dig into why.

markpeek commented 8 months ago

I made the __hash__ change and disabling validation into this PR #2200 if you want to try that branch out.

markpeek commented 8 months ago

Fixed in Release 4.5.1

Thank you @ITProKyle @ndparker @JohnPreston for collaborating on getting this issue resolved quickly.