brownplt / pyret-lang

The Pyret language.
Other
1.06k stars 106 forks source link

Should we add a notion of "named rows"? #1746

Open blerner opened 1 month ago

blerner commented 1 month ago

Right now, our table syntax has a notion of named columns (and the names are required to always be distinct), and rows are entirely anonymous. But for some data examples, e.g. connectivity in a graph, it might make sense for the rows to be labeled as well. This would allow us to treat the contents of the table more symmetrically and more like a matrix, and might be amenable to further manipulation as matrices.

I'm wondering if we should riff of the syntax for spy: / spy "name":, check: / check "name": and do something like

table: c1, c2, c3:
  row: r1c1, r1c2, r1c3
  row: r2c1, r2c2, r2c3
end

table: c1, c2, c3:
  row "r1": r1c1, r1c2, r1c3
  row "r2": r2c1, r2c2, r2c3
end

I'm pretty sure this is grammatically unambiguous, but it is a breaking change: we would have to make row be a keyword, and not just row: (currently, row can be used as a normal identifier). Is this worth it?

We might also want to symmetrize the names a bit more carefully. Currently, the column names in table syntax are required to be NAME identifier tokens, but in add-column they're allowed to have spaces/not be identifiers. So we might want to loosen the grammar to

table-expr: TABLE table-headers COLON table-rows END
table-headers: [(table-header COMMA)* table-header]
table-header: NAME [COLONCOLON ann]
!!!!!! new !!!!!
table-header: STRING [COLONCOLON ann]

table-rows: [table-row* table-row]
table-row: ROWCOLON table-items
!!!!!!! new !!!!!!!!!
table-row: ROW (STRING | NAME) table-items

Then Row values can have an optional name in them (accessible with .get-name() -> Option<String>), and we'd add a new constructor [raw-named-row(name): ...] to match [raw-row: ...] and new method table.named-row(name, c1, c2, c3) to match table.row(c1, c2, c3). In the implementation, a table value would store an array of row names just as it does an array of column names.

Our load-table syntax could add a new clause row-names NAME, so that loading a spreadsheet can generate named rows as well as named columns.

If we had a hypothetical matrix library, then we could lift matrix operations to table operations, and generate tables with the appropriate row and column names (and e.g. only tolerate T1 x T2 if T1.column-names match T2.row-names, and if all the values are numeric) If row names are all present and all unique, we could also add a table-transpose operation that exchanges rows for columns, and still have a meaningfully-shaped table.

The symmetry isn't perfect, and leads to a bunch of design questions. table.column(name) returns a List<Col> values but not a name, as does table.column-n(ndex). We have table.row-n(index) returning a Row. If Rows store their own names, why don't columns? Should we add a table.row(name) method to match? Should it return an anonymous Row like an anonymous column? (If so, then table.add-row(some-named-row).row(some-named-row.name) would not equal some-named-row...which seems odd.) Should table.add-row(Row) enforce row name-uniqueness? Do we want to tolerate a mixture of anonymous and named rows?

@shriram, @jpolitz , thoughts?