cybnity / foundation

Open source cyber-serenity platform that help the security teams designing and managing companies' ISMS, and which allow them to create resilient digital systems with efficient responses against to cyber-threats.
Apache License 2.0
15 stars 2 forks source link

As integration technology, I should use serialization format for exchange between analytics/applications over Kafka #248

Open olivierlemee opened 3 months ago

olivierlemee commented 3 months ago
        - [ ] Evaluate and decide usage of Kafka serialization implementation technology regarding the facts and data exchanged on the DIS:
           - [ ] Apache Avro serializer/deserializer: https://avro.apache.org/ Avro schema file generations from java classes to IDL files (migration of POJO to IDL allowing to manage the schema generated as avro file for POJO mapping and Java classes auto-generation): https://www.instaclustr.com/blog/exploring-karapace-part-2/
              - AVRO ADVANTAGES
                 - dynamic typing: Unlike Protobuf, Avro does not require code generation, which enables more flexibility and easier integration with dynamic languages like Python or Ruby.
                 - Self-describing messages: Serialized data in Avro includes embedded schema information, making it possible to decode the data even if the reader does not have access to the original schema. As we have seen from the example, an Avro message must always be prefixed with some information about which schema was used to encode it or the decoder will either fail or create invalid data. Adding default values to the schema is very important to allow a value to be removed later.
                    - Protobuf: Despite the slightly smaller encoded data size for Avro, the ability to update Protobuf message definitions in a compatible way without having to prefix the encoded data with a schema identifier makes it a better choice for the data transmission of where object versions shall be automatically and dynamically managed by the deserializer. More easy to debug than protobuf which is less humand-readable wire format. BUT: Slower serialization/deserialization performance due to dynamic typing nature and embedded schema information
                 - Verbosity of schema definition in JSON
                 - JAVA
                    - IDL file (domain objects specification) > AVRO file (schema versioned, auto-generated java class)
                    - Avro file usable by Producers/Consumers for POJO mapping (e.g exchange of serialized data over Kafka or Redis or filesystem)
                    - Retrocompatibility test under Maven about schema and new generated classes: https://docs.confluent.io/platform/current/schema-registry/develop/maven-plugin.html#schema-registry-test-compatibility
                    - Producer example with automatic schema version added: https://github.com/confluentinc/examples/blob/7.4.1-post/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ProducerExample.java
                    - Consumer example with automatic schema version read: https://github.com/confluentinc/examples/blob/7.4.1-post/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ConsumerExample.java
                 - NodeJS
                    - JS encode/decode from [avro-js module](https://www.npmjs.com/package/avro-js) : https://blog.basyskom.com/2021/what-is-apache-avro-compared-to-protobuf
                 - Karapace (Kafka REST and schema registry into a docker instance)
                    - Karapace schema registry: https://www.instaclustr.com/blog/exploring-karapace-part-3/ supporting Avro, JSON schema and protobuf; with REST interface for schema management
                    - Github project : https://github.com/Aiven-Open/karapace