HeapsIO / hxbit

Haxe Binary serialization and network synchronization library
155 stars 30 forks source link

Sparse Data serialization support. #25

Closed nanjizal closed 6 years ago

nanjizal commented 6 years ago

Is there any support for sparse data serialization where fields are only stored in binary if their state differs from the default value. Sparse may not be the correct term, it's normally a term used in relation to matrices ( and arrays ) which are mostly just 0, and often calculations are limited to populated areas and storage involves storing only populated parts. If you see the section on 'Storing a sparse matrix' https://en.wikipedia.org/wiki/Sparse_matrix

Slight variation on the meaning, I am interested to reduce file sizes by only saving fields that are used, ie vary from null or default value.

class User implements hxbit.Serializable {
    ...
    // field is set to it's default on reconstruction 
    // and it's value is only serialized if it differs from the default.
    sp:@ public var sparseDataField0 : Array<Int> = [];
    sp:@ public var sparseDataField1 :  Null<Int> = null; 
}

So with a game state if a field is still in it's default you would not serialize that field, as default could be inferred by it's absence. My use case is for storage of triangles generated from SVG or similar, if thier opacity is 1, transformation matrix is unit matrix, and texturePropertiesID are null, I don't want that data to be serialized as when rebuilding that can be deduced. It's always going to be a trade off between filesizes and deserialisation complexity but I think it would be useful to have an option to 'sparse serialize' ( my term for the concept ).

Can typedef's be serialized ?

nanjizal commented 6 years ago

The alternative is somewhat messy requiring defining a class for each possible serialization and then filter the instances into new versions depending on what you want saved, but that quickly gets messy if you have an Array of these items. Serialize the default values could double or triple the file sizes for some data, so in the case of vector data this would not be ideal.

nanjizal commented 6 years ago

I guess customSerialize might be the right approach but still think "Sparse Serialization" would be a good general feature.

nanjizal commented 6 years ago

I have done a small test and I am getting incorrect Float values for Neko. The fractions 0.7 and 0.2 don't seem to be encoded as I would expect, the repo is below.

hxbitTriangleTest$ haxe compile.hxml
Main.hx:43: 
Triangles:
triangle
  ax: 10 , ay: 10
, bx: 100 , by: 10
, cx: 50 , cy: 100
 alpha: 0.5
triangle
  ax: 100 , ay: 10
, bx: 200 , by: 10
, cx: 150 , cy: 100
 alpha: 0.699999988079071
triangle
  ax: 176.755447387695 , ay: 316.880920410156
, bx: 143.413269042969 , by: 336.617492675781
, cx: 173.399642944336 , cy: 246.126953125
 alpha: 0.200000002980232
triangle
  ax: 61.5087623596191 , ay: 58.7212829589844
, bx: 104.688774108887 , by: 370.880279541016
, cx: 123.877647399902 , cy: 170.246566772461
 alpha: 0.200000002980232

But maybe that's how I am setting the Float. I probably will not be targetting neko so not so worried.

I experimented with sparse serialization using custom, but I am really not sure how to implement it, or if it is even feasible?

Added some custom serialization here that seem to work along side standard serialization. https://github.com/nanjizal/hxbitTriangleTest/blob/master/src/Triangle.hx#L29

But if I try to not send alpha for all triangles it breaks, probably expected, unsure how to work round or if this kind of approach is even possible. https://github.com/nanjizal/hxbitTriangleTest/blob/master/src/Main.hx#L24

ncannasse commented 6 years ago

That's perfectly normal since by default floats are stored in single precision format.

nanjizal commented 6 years ago

Was a bit disappointed with your reply, it only addressed the precision format, which at the time passed over my head. Anyway for completeness I have implemented a test that provides Sparse data population, it requires an additional property but that is an acceptable expectation and the test seems to work fine.
But I am unsure on file size impact, if you think it's worth me looking to implement something in macros you would need to let me know.

https://github.com/nanjizal/hxbitTriangleTest/blob/master/src/Triangle.hx#L54