gabledata / recap

Work with your web service, database, and streaming schemas in a single format.
https://recap.build
MIT License
334 stars 24 forks source link

Allow StringType and BytesType to have undefined byte lengths #413

Closed criccomini closed 10 months ago

criccomini commented 10 months ago

Prior to this commit, Recap StringType and BytesType both required bytes_ to be set. If unset, a default of 65536 was used. This configuration allowed developers to skip the bytes field in the concrete syntax tree (CST), but still required bytes_ to be set in the abstract syntax tree (AST). @adrianisk and I converged on this design in https://github.com/recap-build/recap/discussions/284.

Recently, @mjperrone pointed out that requiring bytes_ in the AST is still awkward for JSON. In the absence of an undefined byte_ option, you have to set something for JSON--either a magic number or INT/LONG_MAX. Thus far, we defaulted to LONG_MAX. But a defined LONG_MAX and an undefined length are actually two separate states. Allowing a converter to specify an undefined string/byte length permits other converters to decide how to deal with the undefined byte length.

I made the change and it fit quite naturally, so I think we're on the right track with this. JSON and Hive metastore both benefited; both support undefined lengths.

I've left the other converters alone for now. This means they'll continue to barf if you go from JSON/HMS to Recap to Avro/Proto. I think that's fine for now. We can revisit this later if we want to.

criccomini commented 10 months ago

@adrianisk @mjperrone @cpard @gwukelic @alexdemeo PTAL

Also, the spec change: https://github.com/recap-build/recap-website/pull/13