Open dselman opened 4 months ago
For LZ compression, how is the byte stream converted into a string in your example? I think you'd want to consider two things.
That said, if you're considering LZ-type compression at all, you may consider storing the result natively in binary if the storage system can handle it, rather than encoding it into a string and then encoding the string into JSON and then encoding the JSON into UTF-8. Storing as binary wastes 0% of the bits. Cosmos DB supported binary attachments, but it's deprecated and they recommend moving to Azure Blob Storage instead; that has the downside of needing to talk to two services. You might consider using a different database than Cosmos DB if you're going to be storing a lot of binary data.
For the class map:
Feature Request 🛍️
Support compression of serialised Concerto objects.
Use Case
ASTs and serialised objects in general are verbose. They compress well due to repeated JSON properties, like $class.
Possible Solution
Provide compress/decompress functions within Concerto core or util.
Context
Detailed Description
Two approaches, which may be complimentary have been explored.
Class Map
This specifically targets the $class properties within the JSON objects produced by the Serializer. The JSON tree is visited to build a Map of all $class values in the JSON. $class entries that start with the same prefix as the root $class are shortened by removing the common prefix.
This map is used to replace the $class properties with indexes into the map, resulting in a JSON object that looks like:
LZ Compression
LZ compression is used on the JSON object (either the source object as-is, or the object after the Class Map has been built). Resulting in a JSON object that looks like:
Results
Class Map: approximately 1.6x compression ClassMap + LZ: approximately 12x compression Just LZ: approximately 10x compression