Get Rid of Colossal Switch Statement

So in game.c there is a nightmare switch statement which is used to parse arguments. This is ... horrifying and hard to maintain.

The goal of this ticket is to make a more dynamic system that is easy to use, flexible, and handles things like command access permissions and argument parsing with a consistent, uniform interface instead of the kinda "anything goes" method we've got going on here.

After discussion with @wyld-sw I've come up with 2 potential solutions. First off, let's consider that in fuzzball, the first 3 characters of the command are the most significant. For most commands, the smallest alias is 3 characters (such as @ su for @ success). For a few commands, 1 or 2 letters are possible. For instance, 'w' for whisper.

So my solution is basically a limited hash. Let's have essentially a series of arrays that are 27 elements each -- one for each character + one for the @ symbol (we actually may need to include another few symbols, but for the sake of argument, let's just work with these 27 characters first). We can make a 'hash' function that translates each character of the command into some 0 - 26 number.

For instance, @ action would be like: command_array[26][0][2]

and that would in turn point to a structure that would have all the command information -- the full command name, argument information, permission flags, etc.

Unfortunately we can't just use a straight up 3 D array like that -- we have to support 1 letter, 2 letter aliases, and even commands that require more than 3 letters to figure out, so its not quite that simple. But that simple case is the basis for my proposed solutions.

I have two proposed solutions; one of them is the index tree, and the other is the colossal array. The difference between the two is primarily in how memory is used and how many pointers are involved. The underlying concept is similar. So let me start with some pseudo code:

INDEX TREE METHOD

/* GIVEN */

struct command {
   char* name;
   unsigned int flags;
   /* whatever - argument information, etc. */
};

struct command_node {
  short type;
  union {
    /* 'cmds' will be a NULL terminated array of commands that 'live' at this node.  We will
     * try to match our command against something in this command array. type == COMMAND
     */
    struct command* cmds;

   /* 'children' will always be an array of 27 pointers which may be NULL for dead-ends, or
    * pointers to struct command_node.  type == INDEX
    */
    struct command_node* children;
   } data;
}

#define COMMAND_NODE_TERM 0
#define COMMAND_NODE_COMMAND 1
#define COMMAND_NODE_INDEX 2

/* GIVEN */
int character_to_map(char c);  /* converts char 'c' to an index 0 - 26 */

/* ONE TIME INITIALIZE */
command_map = (struct command_node*)malloc(sizeof(struct command_node)*27);

/* Load it -- maybe load from file so its dynamic and easy? */
load_stuff(command_map);

/* HOW TO PARSE A COMMAND */

/* Given a command such as */
char* cmd = "@whatever";

int i = 0;
struct command_node* step = command_map[character_to_map(cmd[0])]

do {
    if step.type == COMMAND_NODE_TERM:
       then we didn't find it
   else if step.type == COMMAND_NODE_COMMAND:
      for (int x = 0; step.data.cmds[x] != NULL; x++) {
         if step.data.cmds[x].name == cmd :
             found it, return
      }
      didn't find it
   else if step.type == COMMAND_TYPE_INDEX and i != end of cmd:
       i++;
      step = step.data.children[character_to_map(cmd[i])]
} while(step != NULL);

if we get here, we didn't find it.

This is cool and all and is a very basic and fast tree/hash hybrid. Please note that the do/while loop there is by no means optimized; this is a "straw man" kind of thing. This may be easier to do with recursion or similar instead. What I don't like about it is there is a lot of waste in the form of 'book keeping' and pointer traversal.

COLOSSAL ARRAY So my other idea is the 'colossal array'. Basically, we can make a flat memory block that is 27x27x27 elements large and use pointer math to traverse it. If an element in the colossal array is NULL, then we have reached a dead end and we did not find our command. If an element is an allocated command_node, but the 'cmd' inside it is NULL, then we have reached an "index node". If an element is an allocated command_node, and the 'cmd' inside is a pointer to a NULL terminated array of possible commands to match.

/* GIVEN */

struct command {
   char* name;
   unsigned int flags;
   /* whatever */
};

struct command_node {
  struct command* cmds;  /* If this is NULL, then we are in an index node */
}

/* GIVEN */
int character_to_map(char c);  /* converts char 'c' to an index 0 - 26 */

Then we allocate:

/* Make our array -- this is 76k */
command_map = (struct command_node*)malloc(sizeof(struct command_node)*27*27*27);
memset(command_map, 0, sizeof(struct command_node)*27*27*27);

/* Load it -- maybe load from file so its dynamic and easy? */
load_stuff(command_map);

/* Traverse it */
cmd = "@whatever"

int i = 0;

struct command_node* step = command_map[character_to_map(cmd[0])]

do {
    if step.cmds != NULL:
       for (int x = 0; step.cmds[x] != NULL; x++) {
          if step.data.cmds[x].name == cmd :
              found it, return
       }
       didn't find it
   else i != end of cmd
      step += .... some kind of pointer math, see: https://www.geeksforgeeks.org/multidimensional-pointer-arithmetic-in-cc/
       i++;
} while(step != NULL);

if we get here, we didn't find it.

In either case, all available commands should be in an array somewhere, such as:

command_callbacks = [
   do_action,
   do_armageddon,
   ...
}

Then, we can have a command file that defines a command and all its properties. I'm not necessarily suggesting ini file format, but I'm using it for sake of argument/example:

[@action]
priority=0   ; higher priority will be more likely to match a partial
required_flags=BUILDER
arguments=3
function=1234 ; whatever function index

....
[pose]
min_match=1 ; match a single character
arguments=1
function=4321 ; whatever function index

....
["]  ; note that " won't work with my current hashing ... we'd have to make our tables bigger, or
     ; handle special characters in some one-off way ... or make all special characters hash down
     ; to a single value since there's not too many of them in use.
     ; Most MUCKs override this anyway so optimizing for this case doesn't make sense.
min_match=1
arguments=1
function=4321 ; aliases would just be commands with different arguments

To add a new command, you add a function to the array, and then add it to the config file. Individual MUCKs could then tweak the config file if they want. Want only wizards to build? Easy! Want only BUILDERs to page? Go for it!

Just an initial strawman to shoot arrows at. @nyanster ... any thoughts?

WHY NOT A USUAL HASH TABLE?

Because we have to do a lot of partial matching. We also want to avoid hash paging whenever possible. And because we have a lot of commands that are very similar, efficiently hashing it may be difficult -- we're probably not going to do any better with a regular hash than with these methods here.

WHY NOT AN AVL OR SOME OTHER TREE?

This thing gets used all the time. We want to minimize pointer traversals if we can. We also want to try and make use of the fact that this is static data. We load it up once at MUCK load and that's it.

WHICH ONE DOES TANABI LIKE?

I'd say the colossal array. I don't like that it takes 76k of memory, but it is incredibly efficient otherwise. Its the usual trade-off of more memory vs. more CPU vs. flexibility. I mean, we're not going to beat the hard coded colossal switch for raw performance. 76k of memory sucks because that is probably larger than will fit in L1 or L2 cache, though. But the indexed version is an awful lot of bouncing around the memory block with pointers.

Something we could do as a happy medium is do a colossal array with only '2 steps' instead of '3 steps'. This would take our memory consumption down to 2916 bytes, which will fit nicely in pretty much any L1 or L2 cache, though means we will do more looping to check potential commands. "@ o" for instance will have to check "@ osucc", "@ odrop", "@ open", etc. However, given the memory savings, it may be worth it.

Oh and another idea -- make the first array bigger than the other arrays.

The first character may be several weird characters: @ : " or others, but from that point forward, it is always just letters. so the first array can be like 30+ elements, but each subsequent array can be just 26 elements.

Just an initial strawman to shoot arrows at. @nyanster ... any thoughts?

I have never ever written a UI before (other than just taking arguments from the command line) whether it be CLI, TUI, or GUI, so I have no idea which of those two options would be better... When I first saw the big switch statement myself, I wondered, "I wonder if lex and yacc['s C output] could replace this." But I don't know those two tools/languages yet so even I don't know the answer to that. :P I recently re-downloaded a book on them that I bought long ago, though. (It's DRM-free btw) Aren't these hash ideas just the same as stacking 3 layers of switch statements inside each other? I probably don't understand this very well.

I'm probably not useful here as for this topic or decision. >w<

Just an initial strawman to shoot arrows at. @nyanster ... any thoughts?

I have never ever written a UI before (other than just taking arguments from the command line) whether it be CLI, TUI, or GUI, so I have no idea which of those two options would be better... When I first saw the big switch statement myself, I wondered, "I wonder if lex and yacc['s C output] could replace this." But I don't know those two tools/languages yet so even I don't know the answer to that. :P I recently re-downloaded a book on them that I bought long ago, though. (It's DRM-free btw) Aren't these hash ideas just the same as stacking 3 layers of switch statements inside each other? I probably don't understand this very well.

I'm probably not useful here as for this topic or decision. >w<

It is possible to generate the switch statement through a variety of means, and that occurred to me as well. It's a valid option as well.

The difference is mainly in how its maintained. Adding a command to the current switch statement requires a good deal of effort. We would like to replace it with a more dynamic, easier to manage process. Functionally, its the same as 3 layered switches; however operationally, it can be dynamically loaded whereas 3 stacked switches can't.

It is possible to generate the switch statement through a variety of means, and that occurred to me as well. It's a valid option as well.

Okidoki!

The difference is mainly in how its maintained. Adding a command to the current switch statement requires a good deal of effort. We would like to replace it with a more dynamic, easier to manage process. Functionally, its the same as 3 layered switches; however operationally, it can be dynamically loaded whereas 3 stacked switches can't.

Ahaa, that makes sense. I remember adding a MPI function required effort too, with @wyld-sw having to add the new tab function in quite a few files.

server commands, MPI functions, and MUF primitives are all due for an overhaul on how they're defined and "registered". MPI is actually the cleanest currently (mfunlist.h), but still can be improved.

fuzzball-muck / fuzzball

Get Rid of Colossal Switch Statement #323