Right now, our table syntax has a notion of named columns (and the names are required to always be distinct), and rows are entirely anonymous. But for some data examples, e.g. connectivity in a graph, it might make sense for the rows to be labeled as well. This would allow us to treat the contents of the table more symmetrically and more like a matrix, and might be amenable to further manipulation as matrices.
I'm wondering if we should riff of the syntax for spy: / spy "name":, check: / check "name": and do something like
I'm pretty sure this is grammatically unambiguous, but it is a breaking change: we would have to make row be a keyword, and not just row: (currently, row can be used as a normal identifier). Is this worth it?
We might also want to symmetrize the names a bit more carefully. Currently, the column names in table syntax are required to be NAME identifier tokens, but in add-column they're allowed to have spaces/not be identifiers. So we might want to loosen the grammar to
table-expr: TABLE table-headers COLON table-rows END
table-headers: [(table-header COMMA)* table-header]
table-header: NAME [COLONCOLON ann]
!!!!!! new !!!!!
table-header: STRING [COLONCOLON ann]
table-rows: [table-row* table-row]
table-row: ROWCOLON table-items
!!!!!!! new !!!!!!!!!
table-row: ROW (STRING | NAME) table-items
Then Row values can have an optional name in them (accessible with .get-name() -> Option<String>), and we'd add a new constructor [raw-named-row(name): ...] to match [raw-row: ...] and new method table.named-row(name, c1, c2, c3) to match table.row(c1, c2, c3). In the implementation, a table value would store an array of row names just as it does an array of column names.
Our load-table syntax could add a new clause row-names NAME, so that loading a spreadsheet can generate named rows as well as named columns.
If we had a hypothetical matrix library, then we could lift matrix operations to table operations, and generate tables with the appropriate row and column names (and e.g. only tolerate T1 x T2 if T1.column-names match T2.row-names, and if all the values are numeric) If row names are all present and all unique, we could also add a table-transpose operation that exchanges rows for columns, and still have a meaningfully-shaped table.
The symmetry isn't perfect, and leads to a bunch of design questions. table.column(name) returns a List<Col> values but not a name, as does table.column-n(ndex). We have table.row-n(index) returning a Row. If Rows store their own names, why don't columns? Should we add a table.row(name) method to match? Should it return an anonymous Row like an anonymous column? (If so, then table.add-row(some-named-row).row(some-named-row.name) would not equal some-named-row...which seems odd.) Should table.add-row(Row) enforce row name-uniqueness? Do we want to tolerate a mixture of anonymous and named rows?
Right now, our table syntax has a notion of named columns (and the names are required to always be distinct), and rows are entirely anonymous. But for some data examples, e.g. connectivity in a graph, it might make sense for the rows to be labeled as well. This would allow us to treat the contents of the table more symmetrically and more like a matrix, and might be amenable to further manipulation as matrices.
I'm wondering if we should riff of the syntax for
spy:
/spy "name":
,check:
/check "name":
and do something likeI'm pretty sure this is grammatically unambiguous, but it is a breaking change: we would have to make
row
be a keyword, and not justrow:
(currently,row
can be used as a normal identifier). Is this worth it?We might also want to symmetrize the names a bit more carefully. Currently, the column names in table syntax are required to be
NAME
identifier tokens, but inadd-column
they're allowed to have spaces/not be identifiers. So we might want to loosen the grammar toThen
Row
values can have an optional name in them (accessible with.get-name() -> Option<String>
), and we'd add a new constructor[raw-named-row(name): ...]
to match[raw-row: ...]
and new methodtable.named-row(name, c1, c2, c3)
to matchtable.row(c1, c2, c3)
. In the implementation, a table value would store an array of row names just as it does an array of column names.Our
load-table
syntax could add a new clauserow-names NAME
, so that loading a spreadsheet can generate named rows as well as named columns.If we had a hypothetical matrix library, then we could lift matrix operations to table operations, and generate tables with the appropriate row and column names (and e.g. only tolerate T1 x T2 if T1.column-names match T2.row-names, and if all the values are numeric) If row names are all present and all unique, we could also add a table-transpose operation that exchanges rows for columns, and still have a meaningfully-shaped table.
The symmetry isn't perfect, and leads to a bunch of design questions.
table.column(name)
returns aList<Col>
values but not a name, as doestable.column-n(ndex)
. We havetable.row-n(index)
returning aRow
. If Rows store their own names, why don't columns? Should we add atable.row(name)
method to match? Should it return an anonymous Row like an anonymous column? (If so, thentable.add-row(some-named-row).row(some-named-row.name)
would not equalsome-named-row
...which seems odd.) Shouldtable.add-row(Row)
enforce row name-uniqueness? Do we want to tolerate a mixture of anonymous and named rows?@shriram, @jpolitz , thoughts?