breck7 / scrollsdk

Build on top of Scroll.
https://sdk.scroll.pub
380 stars 17 forks source link

Why multiple grammars? #114

Closed tgirod closed 3 years ago

tgirod commented 3 years ago

Hey. Looking at tree notation, I'm wondering why implementing multiple grammars - prefix, infix, postfix etc?

Wouldn't it add a lot of complexity? If I'm reading a bit of TN code, I would have to know the kind of grammar it is using before I can reason about it. Wouldn't it defeat the purpose of TN? Or am I missing the purpose?

breck7 commented 3 years ago

Great questions.

Hey. Looking at tree notation, I'm wondering why implementing multiple grammars - prefix, infix, postfix etc?

It wasn't always this way. These evolved.

The (anyfix demo) [https://jtree.treenotation.org/designer/#standard%20poop] came after talking to nurses and doctors about how they took their medical notes. We learned that almost all of them develop their own EHR dialect over time. And they would often put the tokens in different orders. "weight 50" or "50lbs" et cetera. And we realized that it is dead simple to parse the correct type of a word once you know the node context, so forcing a strict order is unnecessary.

The (postfix) [https://jtree.treenotation.org/designer/#standard%20chuck] demo IIRC came about because a lot of people responded "some of this reminds me of Forth" and so it seemed like building a Forth like tree language would be a fun exercise. And then although a postfix lang hasn't been the top thing, in doing that it became apparent that using postfix as a type indicator in camel case words is very ergonomic and is now used in (Grammar, the grammar lang) [https://jtree.treenotation.org/designer/#standard%20grammar].

Wouldn't it add a lot of complexity?

It adds no complexity at the Tree Notation level. The concept of a Tree Notation doc as isomorphic to a 2-D spreadsheet does not change.

It makes things less complex for the user of tree langs, because it allows you to build ergonomic languages that fit their domains better, and solve problems using the same or fewer number of tokens. Some langs would be more verbose and clunky without having these capabilities.

It does require a little extra effort up front on the Tree Language designer (ie, which strategies should my language support; make sure I don't have any ambiguities if I'm implementing an anyfix lang; etc).

I would have to know the kind of grammar it is using before I can reason about it.

In practice from a user point of view, this doesn't seem to be a huge barrier. People seem to pick up on the grammar patterns quite quickly.

Writing parsing tools though, you are 100% correct. To correctly know the types you need to be aware of various parsing strategies.

Honestly it seems like there are probably a few big ideas here, that we haven't yet discovered, and could be great opportunities for people to find new big ideas to write papers about. A current somewhat pressing problem is that while it is easy to build a "Tree Notation" library in a different language, once people can parse the trees/grid, there is no clear path or spec on how to do a cross lang implementation of something like (Grammar) [https://jtree.treenotation.org/designer/#standard%20grammar].

Another way to think of it: these new strategies, and the general idea that people can invent wholly new positional types of langs, make the Tree Notation idea as a whole more "antifragile". In other words, the more people play with it and push it in unexpected ways, the more new useful patterns emerge and the stronger the whole thing becomes.

Wouldn't it defeat the purpose of TN? Or am I missing the purpose?

The main purpose is to prove that these 2 and 3 dimensional languages are not only simpler, but can be as powerful (if not moreso, once there's an ecosystem) than all existing 1-D BNF languages. A secondary purpose is to make JTree and Tree Notation as useful as possible, but the important thing is the idea and I don't want to make the claim that I am the one who can be implement the idea and build the most useful tools with it (though trying my hardest!).

tldr; for users in the field it seems not. for developers yes, and we are missing a higher level "spec" for tree language grammars that someone should write a paper about ;)

tgirod commented 3 years ago

ok, if I get this right, TN could allow anyone to organically grow a language dedicated to their field of activity, and progressively extract structures from it so it can be parsed and used for various computations?

tgirod commented 3 years ago

Pulling things together between my two messages: the task of infering structure from a language would be much harder when the type of grammar is not set, don't you think?

Maybe a non-prefix grammar might feel more natural to describe a certain field - but if the price to pay is that you need someone with technical expertise to do the job of building the grammar, it feels like a step backward in term of user autonomy and empowerment.

just throwing random ideas and enjoying the conversation, anyway :)

breck7 commented 3 years ago

So far in practice I haven't found it to be too difficult to build parsers with all sorts of crazy new rules. Although human language users have diverse grammar requests, I've found them all, once refined, to be pretty simple. Despite them being unorthodox. I think these grammars will actually be simpler, because generally what you do is build a tight DSL that doesn't include the kitchen sink. So you don't generally have these massive general purpose programming languages, but instead operate with a very restricted grammar.

And the creativity in grammars that we will see I think will be quite huge compared to what we have now. All mainstream programming languages work by linearly taking a 1D sequence of tokens and transforming that to an AST and going from there. This is not how humans do it.

Think of a human reading a newspaper. They might quickly scan left to right, top down, just looking at the images. Then they might do another quick scan and look at the titles. They don't go in order and frequently hop around, and use lots of "ad hoc" parsing rules to efficiently process the page. They also parse lazily, only parsing a small 2-D section of the page if they are interested in that.

This is how Tree Languages can work. The ones in the designer are actually the more traditional types. The anyfix language hints that there are some new things going on here, but "we ain't seen nuthin yet". Imagine languages where you might have not just 1 "read head", but 2 or even 10, parsing a program in parallel. Imagine not just "prefix and postfix and infix", but "columnfix", where the column # helps determine the parsing of the cell. For something a little bit different, but also that hints at the 2-d capabilities, check out https://www.youtube.com/watch?v=vn2aJA5ANUc

tgirod commented 3 years ago

Good points! That last video example is quite impressive! So in a way, TN could be used to produce a structured text editor from a grammar file?

breck7 commented 3 years ago

TN could be used to produce a structured text editor from a grammar file?

Yes!

Hopefully if you write a Grammar file you'd get support in traditional IDEs (like Sublime,VScode, etc), as well as a new wave of structured editors.

Any structured editors you like that might be a good candidate for a prototype?

tgirod commented 3 years ago

Unfortunately I don't know of any structured text editors.

I remember seeing a proof of concept for Haskell ages ago. The editor was leveraging the language's type system so that any time you wrote the name of a function, it would display corresponding parameter "cells" for you to fill in, which would create new cells etc. I found it very appealing at the time, especially for a language like Haskell.

breck7 commented 3 years ago

Great discussion. Will close this ticket for the archives.