canonical / data-platform-libs

A collection of charm libraries curated by the Data Platform Team
https://charmhub.io/data-platform-libs
Apache License 2.0
9 stars 7 forks source link

[DESIGN] Separation of Concerns (Data Abstraction vs. Event Handlers) #142

Closed juditnovak closed 4 months ago

juditnovak commented 4 months ago

Fully compatible with Charm Patters Design: https://github.com/canonical/zookeeper-operator/pull/121

(Backwards compatibility is confirmed by pipelines here.)

This PR implements the structure below. Allowing both for an internal separation of concerns (Data Abstraction vs. Event Handling), both for a smooth usage with Data Interfaces Charm Patterns while keeping the old interface fully intact.

Data Interfaces version Charm Structure Patterns (3)

deusebio commented 4 months ago

Overall, I really like the new structure with separation of concerns. I have some points worth raising and addressing to just nail this implementation properly. Given the large refactoring, I have been reviewing the entire file, not the diff. I find it easier to provide a general comment with references of the code commit therein, instead of inline comments of the diffs. Hence the long PR comment below.

  1. DataProvides and DataRequires still have some references to events here and here. I don't see why having this method is necessary. This method feels more appropriate for EventHandlers classes, but actually I checked the code and we could probably just live without it, and replace the calls self.relation_data._diff(event) with diff(event, self.relation_data.local_unit) or diff(event, self.relation_data.local_app), depending on the case.

  2. It does not seem to me that DataRelation classes need charm but only model, as it is clear from here. I don't think we need to be backwards compatible on DataRelation (since this is actually a new abstraction of this PR), therefore I would just provide a constructor with model. Given these classes need to abstract away relation data, I suppose that model is good enough, rather than the entire charm. We anyway instantiate these classes in the final classes (e.g. here where we could just put charm.model instead of charm). I would try to be a bit more restrictive and not pass always charm around, but only when really needed (e.g. handlers where we need to registrer observers), given that these two classes (Data and EventHandlers) will be used in different places.

  3. I would honestly have a general class that provides general functions for Provider and Requirer side, with no assumption on the databag payload/model underneath. As it is right now, DataRequires and DataProvides makes some assumption around some of the fields (e.g. here, here and other methods right after, here and here
    3). This creates a somewhat dirty structure for some of the child classes that do not share the same data-model (e.g. Kafka, OpenSearch). Moreover, if we had such a class, we could use EXACTLY the same Provider and Requirer parents for S3, which right now does not support secrets and/or a bit more consistent UX, leveraging on the same functionalities. Eventually, even other teams could import from such Provider/Requirer classes if they were more general. My suggestion would be to provide these information only in the final class (e.g. DatabaseRequiresData, KafkaRequiresData, etc) according to their specific data-model.

  4. Similarly to 3, I would also not make assumption on DataPeerData, here, these specifications should actually be done by the charms authors depending on the content of their peer-relation databag, and not in the charm lib, I believe.

Naming convention

This is largely personal, but given the these names are going to stick (and the discussion we just had about naming), I would like to propose a bit of simplification and structure on the naming of classes. Overall, I kinda find names like DataPeerData or DataRequiresData a bit confusing. Also DataRequires seems a bit inconsistent with its child DataRequiresData. It seems to me more natural to build on names as we go, in a structured way (notation below: NewName (OldName) ) :

Inheritance tree for Data classes: Data (DataRelation) > ProvidesData (DataProvides) and RequiresData (DataRequires) > <Product>RequiresData (DatabaseRequiresData, KafkaRequiresData) and <Product>Data (PeerData) (when we put Requires and Provides together).

Inheritance tree for EventHandlers classes: EventHandlers (DataRelationEventHandlers) > ProvidesEventHandlers (DataProvidesEventHandlers) and RequiresEventHandlers (DataRequiresEventHandlers) > <Product>RequiresEventHandlers (DatabaseRequiresEventHandlers, KafkaRequiresEventHandlers, ...)

I would have a niptick that we should probably use Requirer in place of Requires which would seem more appropriate from an English standpoint (e.g. RequirerData instead of RequiresData). We could probably do this while still retaining the final class naming, e.g. class DatabaseRequires(DatabaseRequirerData, DatabaseRequirerEventHandlers), but I can see that this may hurt someone else. Anyhow, it is not a bit point. I would prefer to go with better English, but I don't mind too much.

juditnovak commented 4 months ago

@deusebio Responses one-by-one prioritized by simplicity

(2) Data classes are only taking Model now. Very good call, this had to be cleaned up.

(1) _diff() is a tricky one -- as it is kind of a link between data and the way data is being used. It takes BOTH the event, AND the "databag-in-question" for the Data class in question. Instead of the independent function, we could move it to DataRelation -- where the corresponding self.databag_in_question could be determined or re-defined in descendants. However, in this case it would be the DataPeer classes inheriting the whole stuff pointlessly.

To me the fact that a Data class has a method with an Event parameter in it is not necessarily a problem. After all these Data classes exist to be used in a context with events.

Alternatively we could re-define the signature of the diff() function to rather take a DataRelation instance as input instead of Union[App, Unit]. That way the event handler could call diff(event, self.data). And I'm adding a new field on DataRelation such as self._data_component (as sadly self.component is NOT the same). How about this solution?

(3) This is an idea that worth consideration for sure. However I would evaluate and address it in a separate PR. First of all, Opensearch IS sharing the model (even if (currently?) Kafka may not). While your point is to have fully independent parent classes, perhaps for more generic future usage, as for now parents are also to define sensible default (that could be overriden in heirs, like SECRET_FIELDS if needed). To me this is an architectual question, that may worth considering. Yet I would do that separately, thus iteratively CI/CD-style improving -- instead of putting ALL our ideas into this PR.

(4) It is the default that a number of charms seem to have defined. (If not it's harmless to have). The charms ARE defining their own secrets using the additional_secrets_fields (for now -- that one may get a better name in the future). Removing the operator-password default may make sense, I'm thinking of it for a while -- however I'd keep that change independent from internal architectural changs. ALSO that's a STRONG behavioral change that has to be well communicated in advance.

Naming: The main rationale of the naming was to avoid overlap with Juju's RelationData. Thus I swapped the name to DataRelation -- and followed that structural logic consistently. I'm not against renaming the internal classes such as

deusebio commented 4 months ago

Cool! Thanks! I'd say that this sounds a good compromise. Summarizing, I would be happy with:

(2) => Agreed! Thanks!

(1) =>

Alternatively we could re-define the signature of the diff() function to rather take a DataRelation instance as input instead of Union[App, Unit]. That way the event handler could call diff(event, self.data). And I'm adding a new field on DataRelationsuch asself._data_component(as sadlyself.component` is NOT the same). How about this solution?

I'm happy with this. I prefer this as IMHO it flows better from a depedency perspective. EventHandlers needs Data, but not the other way around. Hence I would prefer to NOT have a method requiring this. Having a diff as you suggest is indeed better to me! Thanks!

(3) and (4) fine this to be addressed in a separate PR. I'm happy to do this while doing the spark history server charm refactoring, where I would like to improve the s3 classes. Having an abstraction as outlined in (3) is required for that.

=> Naming convention: Happy with that. Let me just confirm with Mykola and Paolo whether to use Requirer or Requires

juditnovak commented 4 months ago

@deusebio

I would prefer (3) to be discussed on the conceptual level before implementing any code.

(4) I'm adding on this PR

juditnovak commented 4 months ago

There may be one risk of removing _diff() though. Despite that it's not supposed to be used, technically it still may be used by charms.

I took a quick look -- doesn't seem to be the case at a quick, non-comprehensive glance. (Whoever may use _diff(), seem to define it for themselves, instead of using the existing data-platform-libs one.)

kaboca@pegasus:~$ grep -r "def _diff(" repos/ --exclude-dir=.tox | grep -v data-platform-libs | grep -v data_platform_libs
repos/mongo/mongodb-k8s-operator/lib/charms/mongodb/v0/mongodb_provider.py:    def _diff(self, event: RelationChangedEvent) -> Diff:
repos/mongo/mongodb-operator/lib/charms/mongodb/v1/mongodb_provider.py:    def _diff(self, event: RelationChangedEvent) -> Diff:
repos/spark/charmed-spark-rock/squashfs-root/lib/python3.8/site-packages/sphinx/testing/comparer.py:    def _diff(self, lhs: pathlib.Path, rhs: pathlib.Path) -> List[str]:

kaboca@pegasus:~$ grep -r "\._diff(" repos/ --exclude-dir=.tox | grep -v data-platform-libs | grep -v data_platform_libs
repos/mongo/mongodb-k8s-operator/lib/charms/mongodb/v0/mongodb_provider.py:                self._diff(event)
repos/mongo/mongodb-operator/lib/charms/mongodb/v1/mongodb_provider.py:                self._diff(event)
repos/spark/charmed-spark-rock/squashfs-root/lib/python3.8/site-packages/sphinx/testing/comparer.py:        return self._diff(
repos/spark/charmed-spark-rock/squashfs-root/lib/python3.8/site-packages/sphinx/testing/comparer.py:        return self._diff(

@delgod Do we feel safe about removing it?

juditnovak commented 4 months ago

@deusebio @welpaolo @delgod

The following changes have been added, each of them in a separate commit

Please let me know if you are happy with them or not. Separate commits thus easy to remove if objections, and merge without.

deusebio commented 4 months ago

There may be one risk of removing _diff() though.

Just to be safe and backward compatible (although we should put some deprecation message on _diff if it is not needed, and people in general should not use private methods actually...), one could also

  1. Keep the same signature for diff
  2. Create a _diff method in EventHandlers with this:
class EventHandlers:
    def _diff(self, event):
        return diff(event, self.relation_data.data_component)

Wouldn't this respect the same API for both global diff and EventHandlers._diff?

deusebio commented 4 months ago

Let me just confirm with Mykola and Paolo whether to use Requirer or Requires

I checked and other charming teams use Requirer/Provider terminology (see e.g. see here and here coming from Observability and Identity), so we should also follow the same standard

@delgod @welpaolo

welpaolo commented 4 months ago

Hi @juditnovak, I am fine with the plan you proposed.

I will keep an eye on the removal of diff() function to ensure that all the tests works :)

@deusebio In terms of naming let's follow what others teams are doing and be compliant with the rest of the repositories.

delgod commented 4 months ago
juditnovak commented 4 months ago

There may be one risk of removing _diff() though.

Just to be safe and backward compatible (although we should put some deprecation message on _diff if it is not needed, and people in general should not use private methods actually...), one could also

  1. Keep the same signature for diff
  2. Create a _diff method in EventHandlers with this:
class EventHandlers:
    def _diff(self, event):
        return diff(event, self.relation_data.data_component)

Wouldn't this respect the same API for both global diff and EventHandlers._diff?

Implemented.

juditnovak commented 4 months ago

Ecosystem naming: implemented