dhowe / RiTaV1

RiTa: the generative language toolkit
http://rednoise.org/rita
GNU General Public License v3.0
354 stars 78 forks source link

Suggested New RiGrammar Feature #551

Open srt19170 opened 5 years ago

srt19170 commented 5 years ago

Most generative grammar tools have the capability to save a generated value and then reuse it later. For example, you might want to generate the name of a story's hero and then use that name throughout the rest of the story. Tracery does this through "actions":

Often you want to save information. Tracery allows you to call actions, which are bracketed statements that appear before symbols in a tag #[someAction]someSymbol#. In their basic form, they create some new rules and push them onto a symbol, creating that symbol if it didn't exist, or hiding its previous value if it did.

You can experiment with Tracery's actions here.

RiTa doesn't currently have the capability to reuse generated values, but I have added it to my personal port of RiTaJS and propose adding it to the master branch.

The approach I have implemented is to indicate "One Time Rules" by prefixing the nonterminal on the LHS of a rule with a $. For example:

$pet => cat | dog | wombat
<start> => I just bought a $pet.  There's no better pet than a $pet!

produces

I just bought a wombat.  There's no better pet than a wombat!

I call these "One Time Rules" because they are only evaluated once; from then on the stored value is used whenever they are referenced. (I welcome any suggestions for a better name! :-)

One Time Rule values can also be used in callbacks, and will be seamlessly inserted as strings into the Javascript code, e.g.,

function mega(str) { return "Mega-"+str;}

$pet => cat | dog | wombat
<start> => I just bought a $pet.  It's a `mega($pet)`

is turned into

mega('wombat')

for execution and produces

I just bought a wombat.  It's a Mega-wombat!

One Time Rule values can also be used in rule weights, e.g.,

$dogweight => 100
$pet => cat | dog [$dogweight] | wombat
<start> => I just bought a $pet.

My current implementation works by (recursively) evaluating all the One Time Rules the first time expandFrom() is called and saving them in a dictionary (Javascript object). After that, whenever a nonterminal is expanded the engine checks to see if it has a value in the dictionary, and uses that if present.

The dictionary is accessible at Grammar.otrs. Prior to the initial call to expandFrom(), the user can manually place values into this dictionary. This allows the user to override the One Time Rules and provide a custom value. The user can actually provide a value for any nonterminal (including ones that do not start with $), enabling the user to manually turn any rule into a One Time Rule with a particular value.

I have also provided Grammar.otrsDisabled which can be set to true to turn off One Time Rules completely. I've also provided Grammar.otrsPrefix which is initially $ but which can be changed to another character if $ is not acceptable for some reason.

There are a few shortcomings with my implementation. There's no attempt made to order the execution of the One Time Rules, so using OTRs within OTRs may not work as expected. (In particular, they will not be executed in the order in which they were defined.) The detection of OTRs in rules and callbacks uses simple matching, so using an OTR with a name that is a substring of another OTR (e.g., $flower and $flowerColor) will lead to problems.

An alternative approach to providing this capability would be to turn any nonterminal into a One Time Rule by prepending a $ to the use of the nonterminal on the RHS of a rule, e.g.,

<pet> => cat | dog | wombat
<start> => I bought a $<pet>.  I'm happy with my $<pet>!  But maybe I should have bought a <pet>?

potentially producing

I bought a wombat.  I'm happy with my wombat!  But maybe I should have bought a dog?

Here the first use of $ creates a value that is reused by any subsequent use, but is still available for use as a normal rule. However, the current architecture of the rules engine makes this difficult to implement. Rewriting the engine to make expandFrom() recursive would make this possible (and simplify the engine code) but there might be concerns about performance.

dhowe commented 5 years ago

Thanks for the excellent write-up of this issue. I am aware of it. I recently implemented a form of one-time-rules for my Dialogic language. In fact, I'm generally happier with the implementation of grammars within Dialogic scripting (used in the Tendar game) vs RiTa, which should indeed be rewritten recursively, possibly even with an actual parser. Take a look at the Generative Elements section on the Dialogic page...

Does this syntax (similar to tracery) make sense to you? It avoids the problem with substrings that you mention above (an issue that makes me hesitant about the syntax you propose). If so, perhaps it makes sense to use this in RiTa as well? Some possible steps going forward:

  1. Add OTR rules to RiTa using Dialogic syntax
  2. Rewrite RiTa grammars fully to properly support recursion (and OTRs)
  3. Rewrite RiTa grammars fully to properly support recursion (using a Parser framework, something like this)
  4. Implement Dialogic syntax/engine for RiTa
  5. Port Dialogic scripting lang to JS (definitely want to do this)
  6. Reimplement Dialogic scripting lang with Parser (possibly SuperPower)
  7. Work on other generative language-related projects

As I may have mentioned, I do have some funding to support some of these tasks, so if you were interested in working on (one or more) of the larger ones, and were interested in a more formal arrangement (including some $$), ping me and we can discuss.

srt19170 commented 5 years ago

Is RiGrammar being extensively used? I'm not sure a rewrite or switch to some new syntax is warranted if not. I'm offering features because I've implemented them in my own personal fork of RiTa, and it seems polite to offer them back, but I don't get the sense that there is a large community of users.

I'm personally not a huge fan of the Tracery-style syntax -- it seems a little awkward and verbose to me. I prefer a simpler syntax where the engine carries the burden of figuring out what to do, even if that may create some limitations.

If you were to undertake a major effort in the area of generative grammar, I can think of a couple of options that are more interesting (to me):

  1. Develop a tool that supports context-sensitive grammars. I'm not sure how useful context-sensitive grammars would be for generation, but it's an area that no tools support (as far as I know) and it would be interesting to explore.

  2. Develop a tool that can generate from examples. I think many creators without a background in math/CS would find it easier to write examples than a grammar.

And thank you very much for the offer to work on this (and with financial support!) but my current side projects keep me too busy to take on anything else.

dhowe commented 5 years ago

Understood -- I'm grateful for the upstream contributions (and if your time situation ever changes, just let me know)

Here's another thought about the OTR feature. Is it possible to parse all the rule-names first, then sort by length, then evaluate them, longest first (to avoid the substring problem) ?

srt19170 commented 5 years ago

After looking at the code, I realized that the existing rules also have the substring problem. It's not apparent if you name the non-terminals with brackets, e.g. but if you don't do that, then a non-terminal named e.g., "foo" could match part of another non-terminal e.g., "foobar" or even that substring in the middle of a terminal string, e.g., "food".

I think a better fix is to only match non-terminals when they are immediately followed by a non-alphabetic character. This will fix the OTRs completely, and fix most of the problems with regular rules. If you allow any sort of string as a non-terminal, there's only so much you can do to prevent clashing. Alternately, you could require that non-terminals are in brackets (or some such delimiters), although that's a bigger change to the "user interface".