fictiveworks / CalyxSharp

Generative text processing for C# and Unity applications
Other
0 stars 0 forks source link

Decide whether Affix Table feature is included in V1 #16

Open maetl opened 2 years ago

maetl commented 2 years ago

This algorithm was implemented thanks to support from RubyNZ. It is an incredibly powerful feature for string rewriting and can reduce large amounts of boilerplate and complexity in grammars if used coherently, but the tradeoff is that it is a lot more difficult to understand than some of the other core features, and there are potentially logic edge cases to deal with.

The technical concept of the feature is to add new syntax that allows higher order application of a rewrite rule to the output of a standard grammar production. The rewrite rules are encoded in an affix table which can also use wildcard patterns to match on any string, but only in an affix position (so not implementing a full regular language). Affix tables can be bidirectional, so lookups/rewrites from key to value can be done from left to right or from right to left.

Test specification showing how this works atomically:

  describe 'wildcard match' do
    let(:affix_table) do
      Calyx::Production::AffixTable.parse({
        "%y" => "%ies",
        "%s" => "%ses",
        "%" => "%s"
      }, registry)
    end

    specify 'lookup from key to value' do
      expect(affix_table.value_for('ferry')).to eq('ferries')
      expect(affix_table.value_for('bus')).to eq('buses')
      expect(affix_table.value_for('car')).to eq('cars')
    end

    specify 'lookup from value to key' do
      expect(affix_table.key_for('ferries')).to eq('ferry')
      expect(affix_table.key_for('buses')).to eq('bus')
      expect(affix_table.key_for('cars')).to eq('car')
    end
  end

The proposed syntax for bidirectional lookups is to use the > and < characters to visually indicate the direction of mapping.

const grammar = calyx.grammar({
  "plural": "the plural of {vehicle} is {vehicle>countable}",
  "singular": "the singular of {vehicle} is {vehicle<countable}",
  "countable": {
      "%y": "%ies",
      "%s": "%ses",
      "%": "%s"
})

grammar.generate({start: "{singular}", vehicle: "train"})
grammar.generate({start: "{singular}", vehicle: "bus"})
grammar.generate({start: "{singular}", vehicle: "ferry"})

grammar.generate({start: "{plural}", vehicle: "trains"})
grammar.generate({start: "{plural}", vehicle: "buses"})
grammar.generate({start: "{plural}", vehicle: "ferries"})

This is pretty cool. But neither the string rewrite wildcard or the bidirectional mapping feels very intuitive.

maetl commented 2 years ago

Another syntax option is to use a single unary application operator with bidirectional mapping handled by flipping the lhs and rhs tokens.

Instead of:

vehicle>countable
vehicle<countable

Do this:

vehicle|countable
countable|vehicle