go-bond / bond

Other
10 stars 3 forks source link

validate row schema #128

Open poonai opened 7 months ago

pkieltyka commented 7 months ago

btw ... I'm just catching up .. the idea of this PR is to query a table metadata of the schema for a row, and then compare it with the app..? this is a cool idea.

And, the issue is we use pluggable serialization formats, as a result, we won't know the exact schema..?

I wonder... with bond tables, don't we pass the serializer a table is using..? if so, then wouldn't we just include the serializer in the metadata, and compare the schema against that...?

another idea, we could add struct tags like bond:"field-name" to each field, and then use mapstructure to grab the schema, hash it, and then compare...?

I don't think we need to rely on capnproto for this stuff..? as our schemas can vary..? but lmk what you guys are thinking

marino39 commented 7 months ago

@pkieltyka Yeah, serializing to map[string]any would work for names. We still need to think about field types and how they get serialized e.g. prototyp.Hash could change its serialization/deserialization making it incompatible.

I & @poonai discussed capnproto in relation to filtering on the rows without deserializing them. I have been working on the library that can read msgpack without making any allocations and @poonai happened to know the library that worked the same way. Apparently one of protobuf creators after quitting Google came up with a new serialization library that worked along the same lines. We thought that maybe it could be also useful here. However, probably not relevant to this PR.

pkieltyka commented 7 months ago

hey guys -- yea I've heard of capnproto from way back.. it's just a bit weird for us to have so many serialization formats were using.. I know we have a pluggable serialization system which is nice, and then I guess we're potentially deciding to use one of them for this row schema validation?

or are you mentioning something else, where we have a value from a row to compute its schema without deserializing the data..? if so, when is it useful for us to get the schema before we deserialize the entire value..?

poonai commented 6 months ago

we were discussing for different use case. Not specific to schema. The ideas is to filter rows without deserializing.

eg:

query := Table.Query().Filter(cond.Func(func(row *Row) bool {
   if row.Expired {
      return false
   }
    return true
})

Most of our use cases involves filters, and we have to incur additional cost of serialization while filtering the irrelevant records.

I and @marino39 were discussing for long time about the idea of filtering the rows without deserializing the rows.

marino39 commented 6 months ago

I think we will need to settle for one and remove pluggable serializers. Otherwise, it might be really hard to figure out this schema for all of them.

Since we use cbor already it probably will be the one. @poonai lmk if you have all figured out.