beda-software / fhir-py

FHIR Client for python
MIT License
174 stars 32 forks source link

Hashable resources #89

Open BSVogler opened 2 years ago

BSVogler commented 2 years ago

In order to use resources in a set they should implement the __hash__() method. A resource is unique by the id + history version id, so putting this together would be the fastest way.

ruscoder commented 1 year ago

The current implementation of __eq__ just checks resources by their reference e.g. Patient/id1. __hash__ should work consistently with __eq__, otherwise if might lead to mistakes.

BSVogler commented 1 year ago

If you iterate on versions then it can be that Patient/id1/_history/1 != Patient/id1/_history/2. So I think the eq need a little bit more complexity by taking the history into account.

How about using __hash__() inside the __eq__ implementation?

BSVogler commented 1 year ago

eq implementation discussion here as well https://github.com/beda-software/fhir-py/issues/51

ruscoder commented 1 year ago

I agree, it makes sense.

m0rl commented 1 year ago

The problem is, should we use versionId for a mutable resource __hash__ updating it would violate hash invariant (as the versionId would change and so would the hash but the containing data structure wouldn't be aware of the change and that might result in all sorts of unexpected behaviour).

ruscoder commented 1 year ago

@BSVogler could you please give an example of a use case where having unversioned __hash__ might be an issue?

mkizesov commented 1 year ago

It seems logical to add versionId to __eq__, but I have a concern because not all FHIR servers implement versionId. Sometimes they don't return it at all, sometimes it 1 every time. So for example the same code that worked with a server which not returns versionId can break after the server upgrade. What do you think @BSVogler ? We can discuss implementing some helper methods in resources to explicitly use it for comparing them.

Regarding __hash__, @m0rl raised a valid concern, from documentation: If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

ir4y commented 1 year ago

Let's implement __hash__ as resourceType + id. It will be immutable and consistent. I don't see any sense to have two resources with the same id but different versionId in the set.

ir4y commented 1 year ago

@BSVogler The library architecture allows to redefine the resource class https://github.com/beda-software/fhir-py/blob/a14a36f93b0462fc046c460a3a617a93daef0487/fhirpy/lib.py#L94 https://github.com/beda-software/fhir-py/blob/a14a36f93b0462fc046c460a3a617a93daef0487/fhirpy/lib.py#L79 So you will be able to implement desired behavior when resourceType, id, and versionId are used for equality.
We will add an example to the docs.

BSVogler commented 1 year ago

Overwriting the class would be good although I would prefer to have it easier to configure. The _history is part of the FHIR standard. That begs the question whether fhirpy is intended to work against a full capabilities server or a limited implementation by default?

When the server supports history just using resourceType + id will cause a lot of confusion. You can update a resource and it can be completely different and then using the equality operator will say that they are equal. (that is also what ruscoder said). The hash implementation could also use a real hash of the json. That would be even more strict.