gluster / glusterd2

[DEPRECATED] Glusterd2 is the distributed management framework to be used for GlusterFS.
GNU General Public License v2.0
167 stars 82 forks source link

Avoid marshalling and marshalling twice #795

Open prashanthpai opened 6 years ago

prashanthpai commented 6 years ago

JSON isn't a very optimal marshalling protocol.

As we happen to use JSON to store in etcd and also use JSON to respond to client requests, there might be an opportunity to return the data we get from etcd to the client, as is, for certain operations.

This prevents get-unmarshal-marshal-reply cycle.

However, very often we have different structures for storing in etcd and to response back to client. For example volume.Volinfo vs api.Volinfo

raghavendra-talur commented 6 years ago

@prashanthpai could you please explain a little more?

In my understanding this is very much necessary if we have to abstract the data from clients. There will almost always be more data in the etcd store thank what is in API.

prashanthpai commented 6 years ago

There will almost always be more data in the etcd store thank what is in API.

That's correct for most of the API responses. Our internal structs can contain fields that we do not want to return to the clients. Hence, today, we create duplicate structs of same object such as Volinfo - one for internal use and another to return to clients. Alternatively, we can use JSON stuct tag json:"-" on fields of internal structures to ensure that those internal fields are not returned to clients.

If we continue to duplicate structures, we might as well use something more efficient to marshall for etcd storage as etcd allows values to be arbitrary bytes.

In my understanding this is very much necessary if we have to abstract the data from clients.

That said, all this would be a low-priority implementation and optimisation detail without breaking abstraction.

phlogistonjohn commented 6 years ago

That's correct for most of the API responses. Our internal structs can contain fields that we do not want to return to the clients. Hence, today, we create duplicate structs of same object such as Volinfo - one for internal use and another to return to clients. Alternatively, we can use JSON stuct tag json:"-" on fields of internal structures to ensure that those internal fields are not returned to clients.

Sure, that's doable. FWIW, what Heketi does is nest the public facing structures within the private ones like:

type Foo struct {
  // I am sent to clients
  Thing string
  Stuff int
}
type Bar struct {
  Info Foo
  MyOwnStuff string
}

However, this does have the downside that every time you need to make a change to the API it forces a change to the db. This may not always be want is wanted esp. if there is a more convenient internal representation. I think there are pros and cons for both approaches.

If we continue to duplicate structures, we might as well use something more efficient to marshall for etcd storage as etcd allows values to be arbitrary bytes.

Perhaps. I also think there are different pros and cons to a binary representation vs a textual one beyond just speed. One advantage of json is that dumping the db for the purposes of debugging is simpler.

Also, my current favorite binary serialization, CBOR, is not on that list. ;-)

In my understanding this is very much necessary if we have to abstract the data from clients.

That said, all this would be a low-priority implementation and optimisation detail without breaking abstraction.

Agreed. FWIW I filed issue #877 partially inspired by this issue because it I think its worthwhile discussing how we can version the data to support upgrades for data format changes like switching from json to cbor or gob or whatever in the future.