fictiveworks / CalyxSharp

Generative text processing for C# and Unity applications
Other
0 stars 0 forks source link

Modifying filter chains and expression syntax #13

Closed maetl closed 1 year ago

maetl commented 1 year ago

Output modifiers (filter chains) format the string that is generated by the grammar production. They are defined by a chain of . separated references following the rule.

"{firstname.uppercase}"
"{intro.sentencecase}"
"{title.lowercase.titlecase}"

There are three separate stages of work needed to support this feature.

Specification

The first stage is to decide on the spec and documentation for the built-in set of string modifiers that can work across different language implementations of Calyx. Currently the Ruby and JavaScript libraries delegate directly to the runtime string objects for any method of arity 0. This means you get inconsistencies like .upcase in Ruby grammars but .toUpperCase in JS grammars.

The specific set of built-in modifiers needs to be normalised across all implementations.

C# modifier implementation

The basic behaviour to handle built-in string modifiers can be developed without needing the exact spec to be defined, starting with some basic String methods like uppercase and lowercase (names can always be changed in future to meet the spec, but we definitely know these two will be included).

Note that modifiers also need to be chainable from left to right, so that the output of left hand side elements in the expression pipe into the right hand side elements, denoted by the . sigil. (I debated using | here, but that is most commonly seen as OR in grammar or logic notations and string template engines aren’t using it as frequently as they used to as a pipe/filter syntax, so I think the . is more understandable—though open to argument on this).

C# modifier extensions

What would a C# extension API look like? In Ruby, authors can embed a module with custom string methods to expand the allowed syntax with their own formatting.

In JavaScript, this feature isn’t fully documented or implemented but is very easy to do in either language because both offer many different ways of dynamically adding functions to an object at runtime. C# is very much not like that, so we need to define a more formal API for extensions.

A good example of a ‘hello world’ string extension I would like to document is a ‘studly caps’ or Spongebob mocking text filter. After getting some informal feedback on the generator API and naming conventions, I don’t think this should go in the built-in API, but it is a great sample code/documentation piece for demonstrating how the extension API works.

maetl commented 1 year ago

Draft specification, pulled from current docs. The only major change I’m thinking of making to this is defining separate modifiers for startcase and titlecase with the former brute-forcing all words to have an initial cap and the latter following AP styleguide or Chicago Manual of Style rules.

uppercase

{expansion.uppercase}

Converts a template expansion to uppercase.

lowercase

{expansion.lowercase}

Converts a template expansion to lowercase.

titlecase

{expansion.titlecase}

Converts a template expansion to title case.

sentencecase

{expansion.sentencecase}

Converts a template expansion to sentence case.

maetl commented 1 year ago

I’m also considering having a pluralising and singularising inflector built-in—and maybe some other ‘small’ linguistics helpers (like indefinite article for nouns: a/an). But none of this is a requirement for now.

bentorkington commented 1 year ago

A Dictionary<string, Func<string, Options, string>> mapping the lowercase etc. identifiers used by the other language to anonymous functions which perform the required operation is demonstrated in my modifiers branch, with sample tests illustrating upper, lower, and Spongebob

I think a C# dictionary can be made to do everything the JS/Ruby languages can merely by using a dictionary since there doesn't seem to be a requirement that these are actually dynamic methods, only that they can be accessed somehow and potentially added at runtime.

maetl commented 1 year ago

For sure. JavaScript at its core is basically a dictionary data structure with inheritance via prototypes (pointers/references between dictionaries) so this is always going to be a viable way of getting some form of dynamic typing to work.

There’s no requirement for the methods to be dynamic, they will most likely be compile time things, but there might be some cases where people would want to extend grammars with custom modifiers to do specific things based on specific runtime state, though that is a bit of a hack/edge case for what this API is intended to do, and the recommended advice would be to think about the modifiers as stateless string formatting methods.

maetl commented 1 year ago

Here’s another approach, perhaps more ‘trad’ and I have no idea if it works exactly as assumed in standard C# (or in Unity). Also not sure whether reflection is going to incur some weird performance issues, but I do like the idea that the modifiers are declared as well-defined string formatting types rather than anonymous functions.

using System.Reflection;

namespace Calyx {
  namespace Modifiers {

    public static class Invocation
    {
      public static string Format(string modifierName, string input)
      {
        bool throwOnError = false;
        bool ignoreCase = true; // This avoids need for symbol table

        Type modifierClass = Assembly.GetType(
          $"Calyx.Modifiers.{modifierName}",
          throwOnError,
          ignoreCase
        );

        var modifierMethod = modifierClass.GetMethod("Format");
        Modifier modifierInstance = (Modifier)Activator.CreateInstance(modifierClass);
        return modifierMethod.Invoke(modifierInstance, new object[] { input });
      }
    }

   interface Modifier {
      public string Format(string input);
    }

   class UpperCase implements Modifier {
      public string Format(string input)
      {
        return input.ToUpper();
      }
    }

   class LowerCase implements Modifier {
      public string Format(string input)
      {
        return input.ToLower();
      }
    }
  }
}
maetl commented 1 year ago

Here’s another way of doing it that punts all dictionary lookups to registration with a static class dict and everything else basically being hard-coded at compile time. There’s almost certainly a more clever/maintainable way to do this, but at least I know exactly what is going on when this code executes.

public static class Extensions {
  private static readonly SymbolTable = new Dictionary<string, ModifierFunc>();

  public static void Register(string modifier, ModifierFunc hook)
  {
    SymbolTable.Add(modifier, hook);
  }

  public static bool ModifierExists(string modifier)
  {
    return SymbolTable.HasKey(modifier);
  }

  public static string InvokeModifier(string modifier, string input)
  {
    return SymbolTable[modifier](input);
  }
}

Extensions.Register("hashtag", (input) => $"#{input}");

public static class Invocation
{
  public static string InvokeModifier(string modifier, string input)
  {
    if (Extensions.ModifierExists(modifier)) {
      return Extensions.InvokeModifier(modifier, input);
    } else {
      switch (modifier) {
        case "uppercase": return input.ToUpper();
        case "lowercase": return input.ToLower();
        default: return input; // hack
      } 
    }
  }
}
maetl commented 1 year ago

Okay, and one more idea that is actually quite cool as an extension API, but I’m not sure yet how to implement.

using Calyx;
using Calyx.Extension;

class MyExtensions {
  [Modifier("hashtag")]
  public static string FormatHashtag(string input, Options opts)
  {
    return $"#{input}";
  }

  [Modifier("studlycaps")]
  public static string FormatStudlyCaps(string input, Options opts)
  {
    string result = "";
    foreach (char c in input) {
      result += (opts.Rng.Next(2) == 1) ? c.ToUpper(c) : c.ToLower(c);
    }
    return result;
  }

  [Modifier("explode")]
  public static string ExplodeSpacesInString(string input, Options opts)
  {
    string result = "";
    int length = input.Length;
    for (int i=0; i<length; i++) {
      result += (i < length-1) ? input[i] + " " : input[i];
    }
    return result;    
  }
}
bentorkington commented 1 year ago

We can use Reflection and custom attributes to find modifier methods, but this creates a situation where custom modifiers might be created with the wrong method signature. There's nothing stopping any method being tagged with our custom attribute, and all we can do is throw an exception at runtime if the signature doesn't match.

An even more OO-way might be to define each modifier as its own class, instead of as methods on some static class, where the method to run the modifier is a basic public string Modify(string input) and any dependencies required can be injected in its (optional) constructor.

This removes the need to pass an Options to methods that don't need, and allows for a modifier to call on more than just an Options object.

public interface IStringModifier {
  string Modify(string input);
}

[Calyx.Attributes.ModifierName("uppercase")]
public class ToUpperCase: IStringModifier {
  // no need to write a default constructor
  public string Modify(string input) {
    return input.ToUpperCase();
  }
}
[Calyx.Attributes.ModifierName("studlycaps")]
public class ToStudlyCaps: IStringModifier {
  public ToStudlyCaps(Options opts, double fractionOfUppercaseLetters) {
    // save the ctor params as privates
  }

  public string Modify(string input) {
    string result = "";
    foreach (char c in input) {
      result += (opts.Rng.NextDouble < fractionOfUppercaseLetters) ? c.ToUpper(c) : c.ToLower(c);
    }
    return result;
  }
}

One drawback is the need to register every subclass of StringModifier instead of registering one static class containing multiple modifiers. I see this as slight.

It's probably also possible for end users to just send a reference to their own assembly and use reflection to register any class that inherits from `StringModifier.

bentorkington commented 1 year ago

Now that I think about it, this seems to make the custom attributes (edit: almost*) worthless as they only apply statically and we're now working with instances that'll need to be registered at runtime.

Is the increase in extendability worth it?

(*) we can still use it to couple the modifier name with the modifier instance, as long as we don't do something like:

var sc1 = new StudlyCaps(opts, 0.5);
var sc2 = new StudlyCaps(opts, 0.7);
maetl commented 1 year ago

Some great work on this. Closing as the overall feature and architecture questions at the macro level are resolved. We can pick this up again at the micro level by opening issues to deal with specific filter questions and implementation details.