gchq / Gaffer

A large-scale entity and relation database supporting aggregation of properties
Apache License 2.0
1.75k stars 353 forks source link

Investigate improving Kryo serialisation for classes from DataSketches #837

Closed gaffer01 closed 6 years ago

gaffer01 commented 7 years ago

It should be possible to register the classes from the DataSketches library that we have serialisers for in the Kryo Registrator. Their existing serialisers should just be wrapped in Kryo serialisers. This should improve the performance of Spark jobs which use them. This may not be simple though because some of the classes, e.g. DoublesUnion, are abstract which seems to cause Kryo problems in finding the correct serialiser. The classes that implement the abstract classes (e.g. HeapDoublesUnion) are not visible outside the DataSketches library so serialisers for them cannot be provided.

p013570 commented 6 years ago

Merged into develop.