JBenda / inkcpp

Inkle Ink C++ Runtime with JSON>Binary Compiler
MIT License
70 stars 13 forks source link

How to serialize entire story to a graph? #85

Closed kalmard0 closed 4 months ago

kalmard0 commented 5 months ago

I'm interested in creating a static graph out of an Ink story that can be e.g. printed or displayed. I see that inkcpp does not support this by default so I'm wondering what's a good way to go about it.

I'm setting the RNG seed to 0 just in case.

What I've tried so far is to explore every choice in the story (taking & reverting to snapshots before making a choice), and assume that if I encounter a situation where the current text and choices are all the same as some time before, I have ran into a cycle and stop further iteration. However obviously this is not correct as the story can have internal logic where the outcome depends on some variable.

Another approach I tried (which didn't work at all) is comparing the memory contents of serialized snapshots to decide if the current choice has already been taken before.

Without some heuristic to find cycles / repeated choices, extraction goes on seemingly forever (using the Intercept as test data).

Can you give some advice on how to deal with this situation? Ideally you could point me to some internal of inkcpp I can surface through an interface that helps me decide if I'm seeing a choice that I have seen before in the exact same state of the story.

Thanks!

JBenda commented 5 months ago

This sounds like a fun project. The binary comparison of the snapshot does not work because of the visit counts aka how often that knot has been visited. Which will increase until 2^32.

The main problem is how to decide when to stop exploring (see Haltingproblem )

A simple idea would be checking the visit counts, I do not remember if they are exposed through. But if you could use them to check which knots have been visited, and you can detect cycles via the change of visit counts.

Another idea is: each choice reference to a jump Target, same target as already taken -> cycle (again I do not know how exposed this is)

I will take a look at what in depth characteristics are exposed, and maybe add some API if needen

kalmard0 commented 5 months ago

Thank you for the response! Does the jump target (is that the same as the path variable?) also encode the state of the game, so even if arriving to the same point, does it let one differentiate between what variables were set on the path up to that point?

JBenda commented 5 months ago

The jump Target is the same as the path and does not encode any state.

Also is it currently not possible to iterate all globals (meta programming was never a thought until now)

An idea of mine would be a run mode which lists all choices (also which are disabled by a condition) then it should be relatively easy to traverse the FST.

A step further would be: print formulas and variable names instead of the result, then it should be relatively easy to build a graph out of it, since you can check if you already visited a choice target, and since no choices are hidden you can be sure you visit every thing first path. And if you reach a choice point the stack is clear, so no problems there either.

kalmard0 commented 5 months ago

Hm, my goal is to make the resulting static graph an exact representation of the game, e.g. if a choice leads to some different state on a later visit depending on some variable, that should be represented by a different node in the graph (even if their texts are otherwise identical).

What about simply being able to retrieve actual variables through an API (which I assume is part of a snapshot)? In this case each node can be uniquely identified by their text + every variable's value at the time of reaching the node.

Btw to be clear, I don't expect you to do any work for my benefit here, I was just hoping you can guide me towards the right path to do this.

JBenda commented 5 months ago

The snapshot looks like the following

8byte:  total length
8byte:  runner cont
runner count * 8 bytes: runner start offset
Serialized globals
Sterilized  runner 0..N

The runner has no state of interest for your goal (I think)

The content of the serialized globals is "described" in globals_impl.cpp:snap

roughly:

8byte turn_count
visit_counts:
    8byte length
   N times:
       4byte: visits  # number of visits
       4byte turns # number of turns since last visit
visit_count_backups (not of interest)
static and dynamic strings 
value encoding for lists varibles 
list of all variables

I think if you start there digging you may find what you are looking for, good luck and feel free to ask.

kalmard0 commented 5 months ago

Based on your explanation, I wanted to see if the snapshot's variables + current choice text can properly identify nodes. Unfortunately it seems not.

Here's my sample code for getting the snapshot variables as an identifier:

SHA256 hash;
auto globals = story->new_globals_from_snapshot(*snapshot);
auto variables = globals->var_hashes();
for (auto& v : variables) {
    const auto var = globals->get_var(v);
    if (!var) {
      continue;
    }
    switch (var->type) {
      case ink::runtime::value::Type::Bool:
      case ink::runtime::value::Type::Uint32:
      case ink::runtime::value::Type::Int32:
      case ink::runtime::value::Type::Float: {
        hash.add(&var.value(), sizeof(var.value()));
        break;
      }
      case ink::runtime::value::Type::String: {
        const char* str = var->get<ink::runtime::value::Type::String>();
        hash.add(str, strlen(str));
        break;
      }
      case ink::runtime::value::Type::List: {
        IZLOG_FATAL("list unhandled");
        // const auto list = var->get<ink::runtime::value::Type::List>();
        // list->begin
        break;
      }
    }
  }

var_hashes() is a function I added to get the contents of the variable stack. Everything else is built in (albeit some functions were marked public to help with this).

I think I simply am lacking a fundamental understanding of how inkcpp (or ink) even works and what a "state" may be. Is there some documentation I could explore?

kalmard0 commented 5 months ago

Interestingly, if the only thing I include in the hash is the name of the variables:

  for (auto& v : variables) {
    hash.add(&v, sizeof(v));

It magically seems to work (and not run forever).

Node count using my original implementation (assume nodes are the same if the text matches): ~300 Node count using this implementation (assume nodes are the same if both the text and the hashed variable names match): ~3000.

Which seems reasonable for the size of the Intercept.

kalmard0 commented 5 months ago

Just thinking out loud: I think it's clear to me now why using the actual variables as part of the global state will never work here. If there is a cycle in the story (A -> B -> C -> A) and any step along the way (let's say B here) increases a variable, every time I revisit A that variable will have a different value, and so I will identify A as a new unique node, rather than one that I've already visited.

kalmard0 commented 5 months ago

Is inkcpp fully deterministic, e.g. if I create a new runner, set its random seed to 0, and replay the same choices as some time before (also with 0 random seed), am I supposed to end up at the exact same path?

JBenda commented 5 months ago

Inkcpp is fully deterministic, since it only takes the choices and commands from the .bin file

And I'm 99.99% confident that every random element is created from the same PRNG (which you can seed)

kalmard0 commented 4 months ago

I haven't made any progress on this, so I'm closing this bug. Thank you for the tips.

kalmard0 commented 4 months ago

I just ran into https://yannick-lohse.fr/graphink/ which does something related to what I had in mind. I will investigate to see how it achieves the graph results.

kalmard0 commented 2 months ago

After looking at graphink and inkjs in detail, and building some prototypes, I believe the following features of inkjs are enough to create a reasonably good serialization of the story:

At this point I'm pretty sure I can achieve what I want via inkjs, although I would still prefer to do it with inkcpp due to performance issues (the search space is huge, even with good heuristics).