Platform-Independent Precompiled Scripts

JayhawkZombie commented 6 years ago

This isn't so much of an "issue" as it is a suggestion (or here's a way I've used this?).

Since Jinx compiles down to byte-code and we can directly access that byte-code, I've tried to think of a way of precompiling scripts.

I use cereal for serializing my game objects into archives, and it has a portable binary format that, as far as I've seen, doesn't seem to have any issues with endianness errors.

Now it's not the nicest solution, but it works as a proof on concept.
A class called BytecodeInstance stores compiled byte-code and serializes/deserializes it to/from binary archives, respectively.

The byte-code is just saved exactly as it is in memory, as an array of uint8_t elements.
It saves the length, too, of course. And cereal adds some bookkeeping data, so it's not exactly the same size as just the byte-code, but the bookkeeping data is minimal.

class BytecodeInstance
{
public:

  void Set(Jinx::BufferPtr Ptr)
  {
    if (!DataArray.empty())
      DataArray.clear();

    DataArray.resize(Ptr->Size());

    size_t pos = 0;
    Ptr->Read(&pos, DataArray.data(), DataArray.size());
  }

  Jinx::BufferPtr Get()
  {
    Jinx::BufferPtr Ptr = Jinx::CreateBuffer();

    size_t pos = 0;
    Ptr->Write(&pos, DataArray.data(), DataArray.size());

    return Ptr;
  }

  template<class Archive>
  void save(Archive & ar) const
  {
    ar(DataArray);
  }

  template<class Archive>
  void load(Archive & ar)
  {
    if (!DataArray.empty())
      DataArray.clear();

    ar(DataArray);
  }

private:

  std::vector<uint8_t> DataArray;

};

And a BytecodeArchive class maintains a collection of BytecodeInstances, and creates new archives from byte-code.

class BytecodeArchive
{
public:

  void Archive(Jinx::BufferPtr BufPtr)
  {
    ByteCodes.emplace_back();
    ByteCodes.back().Set(BufPtr);
  }

  template<class Archive>
  void save(Archive & ar) const
  {
    ar(ByteCodes);
  }

  template<class Archive>
  void load(Archive & ar)
  {
    ar(ByteCodes);
  }

  std::list<BytecodeInstance>& GetCodes()
  {
    return ByteCodes;
  }

private:

  std::list<BytecodeInstance> ByteCodes;

};

This is all assuming I didn't completely botch the usage of your API, but it hasn't crashed yet.

For testing these precompiled scripts, I registered 2 functions, in two different libraries.

Jinx::Variant Testfunc(Jinx::ScriptPtr Script, Jinx::Parameters Params)
{
  std::cout << Params[0].GetString() << "\n";
  return nullptr;
}

Jinx::Variant GetPlayerHealth(Jinx::ScriptPtr Script, Jinx::Parameters Params)
{
  return 100;
}

I had to generate the precompiled scripts once, so I did so by just using the same process one would for compiling scripts from source, then archiving the byte-code, then finally serializing the byte-codes into a binary archive on disk.

auto JinxRuntime = Jinx::CreateRuntime();

const char * src =
u8R"(import core
import 'testlib'
import 'gamecore'

set res to test ("some") func
write line gamecore player health

set gamecore player health to gamecore player health - 50

write line gamecore player health
)";

auto lib = JinxRuntime->GetLibrary("testlib");
lib->RegisterFunction(Jinx::Visibility::Public, {"test", "{string}", "func"}, Testfunc);

auto gamecore = JinxRuntime->GetLibrary("gamecore");
gamecore->RegisterProperty(Jinx::Visibility::Public, Jinx::Access::ReadWrite, "player health", 100);
gamecore->RegisterFunction(Jinx::Visibility::Public, {"get", "player", "health"}, GetPlayerHealth);

auto Compiled = JinxRuntime->Compile(src);
auto Script = JinxRuntime->CreateScript(Compiled);

const char * script2 = u8R"(
  loop from 1 to 10
    set a to 2
    set b to 3
    set c to a + b
    set d to b - a
    set e to a * b
    set f to b / 1
    set g to 10 % b
    set h to 123.456
    set i to 23.45
    set j to h * i
    set k to h / i
    set l to h + i
    set m to h - i
    set n to h % i
    wait
  end
)";

auto Bytecode2 = JinxRuntime->Compile(script2);

const char * script3 = u8R"(

  -- Simple if/else tests
  set a to false
  if true
    set a to true
  end
  set b to false
  if true
    set b to true
  else
    set b to false
  end
  set c to false
  if false
    set c to false
  else
    set c to true
  end
  set d to false
  if false
    set d to false
  else if true
    set d to true
  else
    set d to false
  end
  set e to false
  if false
    set e to false
  else if false
    set e to false
  else if true
    set e to true
  else
    set e to false
  end
  set f to false
  if false
    set f to false
  else if false
    set f to false
  else if false
    set f to false
  else
    set f to true
  end
  set g to false
  if true
    if true
      set g to true
    end
  end
  set h to false
  if false
  else
    if true
      set h to true
    end
  end
  )";

auto bytecode3 = JinxRuntime->Compile(script3);

const char * script4 = u8R"(
  import core
  loop from 1 to 10
    set a to [1, "red"], [2, "green"], [3, "blue"], [4, "yellow"], [5, "magenta"], [6, "cyan"]
    set b to a[1]
    set c to a[2]
    set d to a[3]
    set e to a[4]
    set a[5] to "purple"
    set a[6] to "black"
    loop i over a
      if i value = "blue"
        erase i
      end
    end
    wait
  end
  )";

auto bytecode4 = JinxRuntime->Compile(script4);

BytecodeArchive CodeArchive;

CodeArchive.Archive(Compiled);
CodeArchive.Archive(Bytecode2);
CodeArchive.Archive(bytecode3);
CodeArchive.Archive(bytecode4);  

// For the stress tests below, I looped and compiled additional copies
// of 'bytecode2'

std::ofstream ofile("codearchive.bytecode", std::ios_base::binary);

{
  cereal::PortableBinaryOutputArchive archive(ofile);

  archive(CodeArchive);
}

ofile.close();

After that point, I can just load it up easily (assuming I still added the native functions like above):

std::ifstream ifile("codearchive.bytecode", std::ios_base::binary);

BytecodeArchive codearchive;

{
  cereal::PortableBinaryInputArchive archive(ifile);

  archive(codearchive);
}

After that, the main program looks like:

Jinx::Variant Testfunc(Jinx::ScriptPtr Script, Jinx::Parameters Params)
{
  std::cout << Params[0].GetString() << "\n";
  return nullptr;
}

Jinx::Variant GetPlayerHealth(Jinx::ScriptPtr Script, Jinx::Parameters Params)
{
  return 100;
}

int main(int argc, char **argv)
{
  auto JinxRuntime = Jinx::CreateRuntime();

  auto lib = JinxRuntime->GetLibrary("testlib");

  lib->RegisterFunction(Jinx::Visibility::Public, {"test", "{string}", "func"}, Testfunc);

  auto gamecore = JinxRuntime->GetLibrary("gamecore");

  gamecore->RegisterProperty(Jinx::Visibility::Public, Jinx::Access::ReadWrite, "player health", 100);
  gamecore->RegisterFunction(Jinx::Visibility::Public, {"get", "player", "health"}, GetPlayerHealth);

  std::ifstream ifile("codearchive.bytecode", std::ios_base::binary);

  BytecodeArchive codearchive;

  {
    cereal::PortableBinaryInputArchive archive(ifile);

    archive(codearchive);
  }

  auto & codes = codearchive.GetCodes();

  Jinx::BufferPtr code = codes[0].Get();

  std::vector<Jinx::ScriptPtr> Scripts;

  for (auto & bcode : codes)
  {
    Scripts.push_back(JinxRuntime->CreateScript(bcode.Get()));
  }

Maybe it sounds trivial, or not essential, but it drastically speeds up loading times when booting up the engine.

The binary archives are, for the most part, unreadable, but most literal strings remain in-tact.

For the example above, the following is generated (if viewed as a hex dump) for the binary archive (for a single copy of the 4 scripts):

0104 0000 0000 0000 006f 0000 004a 494e
5800 0001 0017 0000 0000 002a 0404 0000
0073 6f6d 6500 02cd 1ae7 049a 2ae6 1932
688b 1175 74fd 89c1 2541 8653 34e9 ea73
ec02 63d3 3cbd c379 7aba 1f25 4186 5334
e9ea 73ec 2a02 3200 0000 0000 0000 3430
4186 5334 e9ea 73ec 2541 8653 34e9 ea73
ec02 63d3 3cbd c379 7aba 1f0b 9601 0000
4a49 4e58 0000 0100 1700 0000 0000 2d2a
0201 0000 0000 0000 002a 020a 0000 0000
0000 002a 002d 2a02 0200 0000 0000 0000
3202 c07d 348f d965 a72a 0203 0000 0000
0000 0032 1c83 43ae cca1 97c3 2802 c07d
348f d965 a728 1c83 43ae cca1 97c3 0032
aedd 793a b973 3815 281c 8343 aecc a197
c328 02c0 7d34 8fd9 65a7 3432 0462 3f68
be5c 5ed3 2802 c07d 348f d965 a728 1c83
43ae cca1 97c3 1b32 cbb3 b64c c9ba 5643
281c 8343 aecc a197 c32a 0201 0000 0000
0000 0005 32be b415 5d8b 09b3 002a 020a
0000 0000 0000 0028 1c83 43ae cca1 97c3
1a32 7b3a 085a d733 00d2 2a01 77be 9f1a
2fdd 5e40 32d8 33f7 52ea 1d27 a12a 0133
3333 3333 7337 4032 22bf d80a 1b7e 1964
28d8 33f7 52ea 1d27 a128 22bf d80a 1b7e
1964 1b32 a808 4ac4 6ac6 443c 28d8 33f7
52ea 1d27 a128 22bf d80a 1b7e 1964 0532
77e8 e836 0c73 115f 28d8 33f7 52ea 1d27
a128 22bf d80a 1b7e 1964 0032 6c26 6c49
4d13 6317 28d8 33f7 52ea 1d27 a128 22bf
d80a 1b7e 1964 3432 0ff1 9c95 f1ed 3d10
28d8 33f7 52ea 1d27 a128 22bf d80a 1b7e
1964 1a32 a0e8 00cc 92df 72fc 362e 1813
2500 0000 2e0b 2002 0000 4a49 4e58 0000
0100 1700 0000 0000 2a03 0032 02c0 7d34
8fd9 65a7 2a03 0111 3000 0000 2d2a 0301
3202 c07d 348f d965 a72e 2a03 0032 1c83
43ae cca1 97c3 2a03 0111 5700 0000 2d2a
0301 321c 8343 aecc a197 c32e 1065 0000
002d 2a03 0032 1c83 43ae cca1 97c3 2e2a
0300 32ae dd79 3ab9 7338 152a 0300 118c
0000 002d 2a03 0032 aedd 793a b973 3815
2e10 9a00 0000 2d2a 0301 32ae dd79 3ab9
7338 152e 2a03 0032 0462 3f68 be5c 5ed3
2a03 0011 c100 0000 2d2a 0300 3204 623f
68be 5c5e d32e 10ea 0000 002a 0301 11dc
0000 002d 2a03 0132 0462 3f68 be5c 5ed3
2e10 ea00 0000 2d2a 0300 3204 623f 68be
5c5e d32e 2a03 0032 cbb3 b64c c9ba 5643
2a03 0011 1101 0000 2d2a 0300 32cb b3b6
4cc9 ba56 432e 1055 0100 002a 0300 112c
0100 002d 2a03 0032 cbb3 b64c c9ba 5643
2e10 5501 0000 2a03 0111 4701 0000 2d2a
0301 32cb b3b6 4cc9 ba56 432e 1055 0100
002d 2a03 0032 cbb3 b64c c9ba 5643 2e2a
0300 32be b415 5d8b 09b3 002a 0300 117c
0100 002d 2a03 0032 beb4 155d 8b09 b300
2e10 c001 0000 2a03 0011 9701 0000 2d2a
0300 32be b415 5d8b 09b3 002e 10c0 0100
002a 0300 11b2 0100 002d 2a03 0032 beb4
155d 8b09 b300 2e10 c001 0000 2d2a 0301
32be b415 5d8b 09b3 002e 2a03 0032 7b3a
085a d733 00d2 2a03 0111 ec01 0000 2d2a
0301 11eb 0100 002d 2a03 0132 7b3a 085a
d733 00d2 2e2e 2a03 0032 d833 f752 ea1d
27a1 2a03 0011 0702 0000 2d2e 101f 0200
002d 2a03 0111 1e02 0000 2d2a 0301 32d8
33f7 52ea 1d27 a12e 2e0b c101 0000 4a49
4e58 0000 0100 1700 0000 0000 2d2a 0201
0000 0000 0000 002a 020a 0000 0000 0000
002a 002d 2a02 0100 0000 0000 0000 2a04
0300 0000 7265 6400 2a02 0200 0000 0000
0000 2a04 0500 0000 6772 6565 6e00 2a02
0300 0000 0000 0000 2a04 0400 0000 626c
7565 002a 0204 0000 0000 0000 002a 0406
0000 0079 656c 6c6f 7700 2a02 0500 0000
0000 0000 2a04 0700 0000 6d61 6765 6e74
6100 2a02 0600 0000 0000 0000 2a04 0400
0000 6379 616e 0022 0600 0000 3202 c07d
348f d965 a72a 0201 0000 0000 0000 0029
02c0 7d34 8fd9 65a7 321c 8343 aecc a197
c32a 0202 0000 0000 0000 0029 02c0 7d34
8fd9 65a7 32ae dd79 3ab9 7338 152a 0203
0000 0000 0000 0029 02c0 7d34 8fd9 65a7
3204 623f 68be 5c5e d32a 0204 0000 0000
0000 0029 02c0 7d34 8fd9 65a7 32cb b3b6
4cc9 ba56 432a 0205 0000 0000 0000 002a
0406 0000 0070 7572 706c 6500 3302 c07d
348f d965 a72a 0206 0000 0000 0000 002a
0405 0000 0062 6c61 636b 0033 02c0 7d34
8fd9 65a7 2d28 02c0 7d34 8fd9 65a7 2711
b601 0000 2332 22bf d80a 1b7e 1964 2d28
22bf d80a 1b7e 1964 02c1 86be eba1 7309
5f2a 0404 0000 0062 6c75 6500 0611 af01
0000 2d09 22bf d80a 1b7e 1964 2e2e 1911
8001 0000 2e36 2e18 1325 0000 002e 0b

(where your signature "JINX" header is sitting inconspicuously on the first and second line 004a 494e 5800, and at the beginning of the byte-code for every script saved).

Looking at it as UTF-8 instead of a hex dump (in Notepad):

       o   JINX        *   some ���*�2h�ut���%A�S4��s�c�<��yz�%A�S4��s�*2       40A�S4��s�%A�S4��s�c�<��yz��  JINX        -*       *
       * -*       2�}4��e�*       2�C�̡��(�}4��e�(�C�̡�� 2��y:�s8(�C�̡��(�}4��e�42b?h�\^�(�}4��e�(�C�̡��2˳�LɺVC(�C�̡��*       2��]� � *
       (�C�̡��2{:Z�3 �*w��/�^@2�3�R�'�*33333s7@2"��
~d(�3�R�'�("��
~d2�J�j�D<(�3�R�'�("��
~d2w��6s_(�3�R�'�("��
~d 2l&lIMc(�3�R�'�("��
~d42���=(�3�R�'�("��
~d2�� ̒�r�6.%   .   JINX        * 2�}4��e�*0   -*2�}4��e�.* 2�C�̡��*W   -*2�C�̡��.e   -* 2�C�̡��.* 2��y:�s8* �   -* 2��y:�s8.�   -*2��y:�s8.* 2b?h�\^�* �   -* 2b?h�\^�.�   *�   -*2b?h�\^�.�   -* 2b?h�\^�.* 2˳�LɺVC*   -* 2˳�LɺVC.U  * ,  -* 2˳�LɺVC.U  *G  -*2˳�LɺVC.U  -* 2˳�LɺVC.* 2��]�    � * |  -* 2��]�    � .�  * �  -* 2��]�  � .�  * �  -* 2��]�  � .�  -*2��]�  � .* 2{:Z�3 �*�  -*�  -*2{:Z�3 �..* 2�3�R�'�*   -.  -*  -*2�3�R�'�..�  JINX        -*       *
       * -*       *   red *       *   green *       *   blue *       *   yellow *       *   magenta *       *   cyan "   2�}4��e�*       )�}4��e�2�C�̡��*       )�}4��e�2��y:�s8*       )�}4��e�2b?h�\^�*       )�}4��e�2˳�LɺVC*       *   purple 3�}4��e�*       *   black 3�}4��e�-(�}4��e�'�  #2"��
~d-("��
~d����s  _*   blue �  - "��
~d..�  .6.%   .

It looks different in Sublime. Go figure. I can shove that through a base64 encoder and obfuscate it, but that may be more hassle than its worth. There is no compression, I should add. It's just saved in raw binary.

Since Jinx stresses heavy testing, I tested compiling:

400 scripts
1400 scripts (excessive, yes, but why not)

To avoid performance drops from too many allocations, I used Boost's memory pools (I did not change Jinx's custom allocator with global params - it always got slower when I did, so I decided to just leave it be).

400 scripts

From Source: 6.17897 seconds
From Precompiled Byte-Code: 0.025639 seconds
Archive Size: 218KB

Overall: 0.4% as much time.

1400 scripts

From Source: 21.8528 seconds
From Precompiled Byte-Code: 0.084868 seconds
Archive Size: 757KB

Overall: 0.3% as much time.

Anyway, to end a long post, I thought I'd share how I'm packaging scripts so that I only have to compile my scripts once.

Added benefit: If there's a bug in a script, all I have to do is remake the .bytecode file and patch that instead of the entire executable.

As for the strings remaining readable, I initially got the inspiration for serializing everything into game object archives from Unreal Engine. I've studied relatively closely how their serialization works, and many of their literal strings remain in-tact when serialized into shipped content (you can also see the names of the member variables of their classes in the object archives). So I imagine it's not an issue most people are really that concerned with.

JamesBoer commented 6 years ago

Thanks for the detailed report.

For what it's worth, you're using the bytecode API exactly as how I'd imagined someone might. I very intentionally exposed a buffer containing raw bytecode for exactly this scenario, so no, you're not using it incorrectly at all. I'm reluctant to impose a particular serialization methodology, because games and game engines tend to do things a particular way, as you mentioned with Unreal.

Feel free to post any additional feedback, suggestions, or your general experience in using the library.

ghost commented 5 years ago

@JamesBoer Semi-related: Is it possible to store and restore the state of an unfinished script? As in, saving the memory and instruction pointer?

If not, what might be the best approach to avoid losing state through coroutines, i.e. when a game is saved and quit? For example, think some game corp releasing a modding SDK.

Maybe passing an option to forbid the usage of wait would work? In that case it'd probably best to turn those global parameters into local parameters, though...

JamesBoer commented 5 years ago

I had considered this early in Jinx's design, but decided that having save/restore functionality at such a low level wasn't a feature I felt like tackling. In my own game, I simply write any changed state back to native functions for storage.

This would be a pretty major task to undertake. I can't promise anything, but I'll give it some thought and see how feasible something like this would be.

ghost commented 5 years ago

@JamesBoer Thanks. Don't worry about my case though, I probably won't keep state in the vm at all.

Out of interest: In your own game, how do you synchronize the coroutine back to the restored state? I mean, is there some obvious way better than a massive amount of branches?

ghost commented 5 years ago

Also, call me paranoid, but a big, fat disclaimer about running untrusted bytecode being a bad idea probably wouldn't hurt your documentation.

JamesBoer commented 5 years ago

@chack05 Mostly I use Jinx for scripted in-game events. There's actually no way to save manually. The game just autosaves at fixed locations and occasionally after major events. I don't worry about restoring the world state precisely. Instead, the world will just respawn agents and let them restart their behaviors.

For simple cases, when I just need to fire a script off once, this is handled automatically by my game engine, if I set a world trigger to "persistent" and "one-shot", it remembers it's state without any intervention. So, that handles about 95% of the cases for me. Each world trigger has an "event" script and a "post-event" script. The script runs until whatever arbitrary success I decide upon, such as the destruction of some specific spawns, for instance, and then the post-event script runs, which may contain some other events (like a small cutscene), or sometimes just a "save" function.

There are some cases, though, for instance, when I want a trigger to change behavior based on the state of the world, like if you've progressed to a specific point in the story. I have a simple set of functions which can set or get arbitrary data via a central "game state" repository. I just have to identify it by object and property names, and then get or set a variable. Since these are string-based names, it makes it easy to set a value in one place in the game via a script, then check it somewhere else in a different script, and change the behavior accordingly.

So, it's fairly rudimentary, but it's been working pretty well so far.

ghost commented 5 years ago

@JamesBoer Thank you for the insight.

JamesBoer / Jinx

Platform-Independent Precompiled Scripts #2

400 scripts

1400 scripts