acryldata / datahub-actions

DataHub Actions is a framework for responding to changes to your DataHub Metadata Graph in real time.
42 stars 47 forks source link

feat(api): implement from_json method for datahub_actions.event.event_envelope.EventEnvelope #23

Closed Masterchen09 closed 2 years ago

Masterchen09 commented 2 years ago

For some local testings I have implemented the from_json method of the EventEnvelope. As the method is using the event registry for getting the event class the EventEnvelope had to be moved from the event.py into a separate file (the event_registry.py imports the event.py, which would then have imported the event_registry.py, which would not have worked 😡).

github-actions[bot] commented 2 years ago

Unit Test Results (build & test)

59 tests  +3   59 :heavy_check_mark: +3   2s :stopwatch: -1s βŸβ€„1 suites Β±0β€‚β€ƒβ€ƒβŸβ€„0 :zzz: Β±0  βŸβ€„1 files   Β±0β€‚β€ƒβ€ƒβŸβ€„0 :x: Β±0 

Results for commit c0415da4. ± Comparison against base commit 67ccf1ec.

:recycle: This comment has been updated with latest results.

Masterchen09 commented 2 years ago

I had some issues with the value field of the GenericAspect in the MetadataChangeLogEvent, therefore I have removed the tuple parameter. The value field of the GenericAspect is a bytes field, but the JSON representation of the field is a string. After some debugging I have figured out that the avrogen package cannot handle string-encoded bytes field (correctly) when the tuples parameter is true (the path I followed was the following: from_obj calls with_tuple_union -> self.fastavro is True when tuples is true -> from_obj also calls from_json_object which calls _generic_from_json -> _generic_from_json calls _union_from_json which calls validates -> validates accepts strings for bytes fields only if self.fastavro is false).

I am not sure if this might be an issue of the avrogen package, but it worked with the removed tuples parameter, therefore I was happy at that point. 😁

For the EntityChangeEvent we might not need to change the tuples parameter, because there should not be any bytes field in there or at least at the moment?

I have also added some tests for the from_json method of the two event classes and the event envelope, I think these are helpful to verify that the methods are working correctly.

The parameter deserialization now also works for the EnitityChangeEvent and is using the hack from the Kafka source (not nice, but it works...?). I have also noticed that the from_obj method of the events does not return an Event instance, instead an instance of the parent class is returned, therefore the from_class method is used to construct a valid Event instance (originally I only wanted to quickly test something...therefore I did not noticed that the instance was not an Event instance).

To have the hack with the parameters only at one place I have also updated the Kafka event source to use the from_json method of the EntityChangeEvent (the Kafka event source no longer needs it's own logic for the deserialization of the message).

jjoyce0510 commented 2 years ago

Amazing! Looking over this again now