Open crast opened 9 years ago
Hi @crast,
First of all, I'll be willing to accept this kind of PRs that will stabilize the API from user perspective. A couple of notes regarding your points:
One more note, you mentioned semantic versioning for gopkg.in in #47 and I think it's totally ok to go with this approach however I think the current state of things deserves to be a version 0 with possible API breakages. I think once we agree on all these changes described in this issue and #47 we are good to release v1 and make sure we don't make any breaking change without bumping the library version anymore.
The last thing, I'm going to merge #49 right now, it's a 100% good change. Any additional PRs/issues are more than welcome from you.
Thanks!
Thanks for your reply and merging the PR's
For point 1: yes I'd keep NewBinaryEncoder and NewBinaryDecoder exported, to allow people to keep accessing them and using them - just that they'd now return an interface value, instead of a concrete type.
For point 3: The chain of logic that had me arrive at this suggestion goes something like
RecordSchema
et al, they parse a schema into a Schema
interface then pass that interface into a decoder/encoder. So they don't usually care about what's inside, only power users can do that, and find that in its own subpackage.*RecordSchema
concrete type, and similar for a few others like *RecursiveSchema
and such.
Schema
interface and expect it to decode, it will probably panic somewhere along the chain.encoding/json
encoder follows this kind of pattern, building a new encoder every time it sees a new type, and caching it.RecordSchema
are mutable, that is, you can add/remove/modify fields inside it at any time. This makes trivially caching this schema harder, you'd have to walk through and hash all the internals every time, which might be more expensive than the savings we get from building the execution plan.FreezeSchema(s Schema) Schema
(or maybe CompileSchema) which returns a special frozen schema where if you use this, it gets the advantage of using that cache for very fast encode/decode.To your final note, yeah I think 0.x having a series of API breaks before 1.0 makes sense.
for 2) Not a custom schema per se, just a type that happens to for example provide the facilities of RecordSchema plus some added functionality - this could, for example, solve the situation for the guy who wanted to store additional properties in the schema as an interface{}
for 3) I think it's valuable to have the schema types still be mutable for code generation and schema building / conversion / other scenarios. For example, someone might take a list of JSON entities, or records in a CSV file, and convert it to avro and not know the schema before making it. But, this is probably not the most common use case.
In many common use cases, people have a schema they know, and they want to encode/decode it, thousands or millions of times, fast. To have an immutable wrapper factory would make it possible to cache those at least for those who want the speedup, and you wouldn't be forced to use it.
I'm only making a conjecture that such an execution plan based approach would help speed things up, it remains to be seen, would have to make a trial run at it. If it's only a small speedup it's not worth it, naturally.
Ok, the second case is much clearer to me now, but this description definitely involves making datum readers/writers independent on schema implementations, e.g. schema should know how to decode/encode itself properly. Otherwise seems a cool feature to me.
Regarding the third case I agree that the most common use case is to decode/encode lots of records as much fast as possible. I think your proposed approach is definitely worth to try.
I'm not sure I can devote some time to help implementing this right now because I actually have lots of other things to do but if you can propose some kind of roadmap to achieve this I can definitely participate.
Thanks!
Since you're talking about having master be an API break in #47, I would like to collect a few changes to make this library a bit more idiomatic with go usages:
NewBinaryDecoder
should return a Decoder interface, not the concrete typeNewBinaryEncoder
returns an Encoder interfaceTell()
from the Encoder interface. From what I can tell, it's not used at all and constrains writing your own Encoder should you desire.NewBinaryEncoder
subsequently take anio.Writer
at construction. This is actually not a breaking change for user code because*bytes.Buffer
satisfies io.Writer, but it allows people to pass in other writers, like, for example, a network socket to encode avro directly to a network connection, or a file, or the like.FixedSchema
,EnumSchema
,IntSchema
and so on. This reduces visual clutter, and tab completion confusion. It should also be noted that even though there are pieces which switch on type codes usingschema.Type()
, the datum writers do type asserts to get the concrete types of many of the the schema types so it's not like someone can simply implement theSchema
interface with their own type (or even embed the type) and use it as a replacement as it stands.Nearly all of these changes, while technically breaking, shouldn't break the vast majority of user code, because most users aren't manipulating schema types or embedding the BinaryEncoder type, they just want to encode and decode from / to avro.
I'm happy to submit PR's for any and all of these changes if you approve of them