brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

Document TZNG divergence in ZNG spec #662

Closed philrz closed 4 years ago

philrz commented 4 years ago

Currently the ZNG spec says:

The ZNG text format is a human-readable form that follows directly from the ZNG binary format.

However, there's a couple known areas of divergence where binary ZNG concepts aren't represented in TZNG. Currently, @mccanne has acknowledged two areas:

  1. "Type codes" since the ZNG type codes are implied, but implied type codes would be really hard to write in TZNG if you were writing a test.
  2. "End-of-stream markers", as there's not really an easy way to represent these.

Over time ZNG is evolving into the definitive format and TZNG is a helper format that is not quite one-to-one with ZNG. Therefore this issue serves as a reminder that we should add a section to the ZNG spec that discloses these areas of divergence.

mccanne commented 4 years ago

Actually, I think we could include end-of-stream easily making it optional at EOF and by default not needing it (just at eof) so it doesn't clutter all our tzng examples.

philrz commented 4 years ago

I think we can close this one. Revisiting the topics listed in the description:

  1. The current ZNG spec seems pretty clear about how a TZNG Type Binding with its specified integer "type tag" is not the same thing as the type ID that's described in Typedefs for binary ZNG.
  2. TZNG end-of-stream markers are documented in the beta notification part of the ZNG spec as not yet implemented in zq. We've also got #1364 open where we might choose to implement it in zq.

There may be other things over time we might need to add to the beta notification if we choose to leave them out of zq (e.g. decimal128 type, as discussed in #1422) but per the original spirit of this issue, we have the spot in the docs to disclose these things. We just have to continue keeping it accurate.