Prior to this commit, Recap StringType and BytesType both required bytes_ to be set. If unset, a default of 65536 was used. This configuration allowed developers to skip the bytes field in the concrete syntax tree (CST), but still required bytes_ to be set in the abstract syntax tree (AST). @adrianisk and I converged on this design in
https://github.com/recap-build/recap/discussions/284.
Recently, @mjperrone pointed out that requiring bytes_ in the AST is still awkward for JSON. In the absence of an undefined byte_ option, you have to set something for JSON--either a magic number or INT/LONG_MAX. Thus far, we defaulted to LONG_MAX. But a defined LONG_MAX and an undefined length are actually two separate states. Allowing a converter to specify an undefined string/byte length permits other converters to decide how to deal with the undefined byte length.
I made the change and it fit quite naturally, so I think we're on the right track with this. JSON and Hive metastore both benefited; both support undefined lengths.
I've left the other converters alone for now. This means they'll continue to barf if you go from JSON/HMS to Recap to Avro/Proto. I think that's fine for now. We can revisit this later if we want to.
Prior to this commit, Recap StringType and BytesType both required
bytes_
to be set. If unset, a default of 65536 was used. This configuration allowed developers to skip thebytes
field in the concrete syntax tree (CST), but still requiredbytes_
to be set in the abstract syntax tree (AST). @adrianisk and I converged on this design in https://github.com/recap-build/recap/discussions/284.Recently, @mjperrone pointed out that requiring
bytes_
in the AST is still awkward for JSON. In the absence of an undefined byte_ option, you have to set something for JSON--either a magic number or INT/LONG_MAX. Thus far, we defaulted to LONG_MAX. But a defined LONG_MAX and an undefined length are actually two separate states. Allowing a converter to specify an undefined string/byte length permits other converters to decide how to deal with the undefined byte length.I made the change and it fit quite naturally, so I think we're on the right track with this. JSON and Hive metastore both benefited; both support undefined lengths.
I've left the other converters alone for now. This means they'll continue to barf if you go from JSON/HMS to Recap to Avro/Proto. I think that's fine for now. We can revisit this later if we want to.