brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.34k stars 67 forks source link

Deprecate multi-from syntax #5143

Open philrz opened 2 weeks ago

philrz commented 2 weeks ago

tl;dr

We intend to deprecate multi-from syntax, e.g., users that previously did:

from (
  pool sample.zng
  pool prs.json
  get https://ifconfig.co/
)

Would now use something like:

fork (
  => from sample.zng
  => from prs.json
  => get https://ifconfig.co/
)

To give users time to transition, we intend to surface a warning in the Zed tools when a user runs a program that uses the multi-from syntax.

Details

At the time this issue is being opened, Zed is at commit 7c18667.

We've recognized that we could more easily maintain momentum with our recent improvements in error handling if we could drop support for multi-from, i.e., this lower portion of what's shown in the Synopsis portion of the from docs:

from (
   pool <pool>[@<commitish>] [ => <leg> ]
   pool <pattern>
   file <path> [format <format>] [ => <leg> ]
   get <uri> [format <format>] [ => <leg> ]
   pass
   ...
)

While not specific to its use with join, there was a time that the multi-from syntax was the way shown most prominently in the docs for lining up the two join inputs. While the main join doc no longer shows multi-from, we still currently reference it in the join tutorial as the "alternate syntax". We're aware of a community zync user org that still makes heavy use of multi-from with join, so their successful transition to the preferred syntax will be important.

We've reached consensus on a multi-part plan:

  1. The relevant sections of the Zed docs will be updated to emphasize that the multi-from syntax is being deprecated. The syntax we advise using instead will be shown along with a suggestion that the reader ping us on community Slack if they need assistance modifying their programs.

  2. The Zed tools will be enhanced to surface a deprecation warning if a user executes a program that uses multi-from, including a hyperlink to the docs that provide the explanation and guidance.

  3. After we're confident the community zync user org has transitioned and we've given the rest of the community adequate time to react, the multi-from syntax will be dropped from the language and treated as an error.

A prerequisite before moving forward is that we may need to think more about the syntax we'll advise. While the fork variant shown in the opening issue text above is functional, @mccanne expressed some concern about fork as a verb here. When used to split dataflow mid-program it seems to make sense, but it may be seen as an awkward way to kick off a program as users often do today with multi-from, i.e., is it intuitive to fork nothing?