albertodemichelis / squirrel

Official repository for the programming language Squirrel
http://www.squirrel-lang.org
MIT License
895 stars 149 forks source link

Add more raw object manipulation functions #200

Closed frabert closed 4 months ago

frabert commented 4 years ago

This is intended to aid in creating wrappers for OOP languages, it's similar in spirit to OpenGL's DSA: it allows manipulating Squirrel objects without having to push and pop them on the stack.

I'd like to hear what you think of this, if it's a good idea at all or if the amount of redundancy added to the API is too much. I personally would find this very helpful.

zeromus commented 4 years ago

That's quite at odds with the design. Any API can be widened totally in this way (and even further), or enstackened almost totally as has been established in squirrel. It's schizo to do both. OpenGL has big, big, big problems due to its historical trajectory and that's why it's got the provisions it does, but the case is not so bad in squirrel which is static linked on the "client" side. Do your APIs actually bench to anything faster? Can you give some before-and-after examples?

frabert commented 4 years ago

No, no benchmarks or anything, just a "code cleanliness" and the fact that otherwise I'd be pushing and popping values even when not needed.

Let's say I have an Array wrapper, here's how I would do it using the "stock" API:

struct Array {
  HSQOBJECT arr;
  HSQUIRRELVM v;

  void Append(HSQOBJECT *o) {
    sq_pushobject(v, &arr);
    sq_pushobject(v, o);
    sq_arrayappend(v, -2);
    sq_pop(v,1);
  }
};

Please excuse any errors in the above code, I have not tested it and it's only meant to illustrate the point.

If one were to repeatedly append to the same array with the above method, it would end up doing lots of pushes and pops that are actually not needed. This extension is meant to alleviate that. I understand if it's not in the spirit of the original API and I'm willing to maintain it as a personal extension of sort, if it doesn't make it into master :)

zeromus commented 4 years ago

Well, I suggest you bench it. This is so much a matter of preference that concrete results would help your case. Especially considering the market for squirrel, compelling performance gains would be a compelling case.

Regarding your test case, it's possible the pushes and pops are optimized out. I have always hoped that... but I never investigated it in detail.

Working against your propsal: the current way of doing things incurs less technical debt due to there being less code, making it easier to change internals without having to change 1000s of special purpose apis implementations. I'm not sure I like that for squirrel, from a project management perspective. If you implemented the new apis in terms of the old ones that would mitigate this, but also eliminate most possibility of improved performance.

frabert commented 4 years ago

Here is my benchmark:

#include <chrono>
#include <squirrel.h>
#include <iostream>
#include <cassert>

struct Array_Stock {
  HSQOBJECT arr;
  HSQUIRRELVM v;

  void Append(HSQOBJECT *o) {
    sq_pushobject(v, arr);
    sq_pushobject(v, *o);
    sq_arrayappend(v, -2);
    sq_poptop(v);
  }

  void Pop() {
    sq_pushobject(v, arr);
    sq_arraypop(v, -1, false);
    sq_poptop(v);
  }
};

struct Array
{
  HSQOBJECT arr;
  HSQUIRRELVM v;

  void Append(HSQOBJECT *o) {
    sq_objarrayappend(v, &arr, o);
  }

  void Pop() {
    sq_objarraypop(v, &arr, NULL);
  }
};

int main() {
  HSQUIRRELVM vm = sq_open(1024);
  HSQOBJECT arr;
  HSQOBJECT i;
  sq_resetobject(&arr);
  sq_resetobject(&i);

  sq_newarray(vm, 0);
  sq_getstackobj(vm, -1, &arr);
  sq_addref(vm, &arr);
  sq_poptop(vm);

  sq_pushinteger(vm, 3);
  sq_getstackobj(vm, -1, &i);
  sq_addref(vm, &i);
  sq_poptop(vm);

  auto start = std::chrono::steady_clock::now();
  sq_pushobject(vm, arr);
  for(size_t n = 0; n < 10000000; n++) {
    sq_pushobject(vm, i);
    sq_arrayappend(vm, -2);
    sq_arraypop(vm, -1, false);
  }
  sq_poptop(vm);
  auto end = std::chrono::steady_clock::now();

  auto dur = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
  auto baseline = dur;
  std::cout << "Stock API: " << dur.count() << "ms (1x) \n";

  Array_Stock stock{arr, vm};
  start = std::chrono::steady_clock::now();
  for(size_t n = 0; n < 10000000; n++) {
    stock.Append(&i);
    stock.Pop();
  }
  end = std::chrono::steady_clock::now();

  dur = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
  double gain = (double)dur.count()/baseline.count();
  std::cout << "Wrapped stock API: " << dur.count() << "ms (" << gain << "x)\n";

  Array direct{arr, vm};
  start = std::chrono::steady_clock::now();
  for(size_t n = 0; n < 10000000; n++) {
    direct.Append(&i);
    direct.Pop();
  }
  end = std::chrono::steady_clock::now();

  dur = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
  gain = (double)dur.count()/baseline.count();
  std::cout << "New API: " << dur.count() << "ms (" << gain << "x)\n";
}

And here is a typical result on my machine:

Stock API: 3402ms (1x)
Wrapped stock API: 6142ms (1.80541x)
New API: 1461ms (0.429453x)

Take what you will from it, I don't think it's very representative of real-world usage, but I still think that the fact that the new API actually beats correct usage of the stock one is interesting.

rversteegen commented 4 years ago

Well, there already is a set of a few "raw object api" functions, and this would be an extension to that.

There is a lot of code duplication here, but it could be removed by reimplementing the existing sqapi functions as wrappers around these new, lower-level functions, probably with negligible effect on the performance of the original API.

I would consider using these functions, whether it gets merged or not. But in most cases the performance of manipulating objects via the C api is probably unimportant relative to the performance of doing the same from squirrel/the VM.

Regarding your test case, it's possible the pushes and pops are optimized out. I have always hoped that... but I never investigated it in detail.

It would be impossible for the compiler to remove any pushes/pops even with LTO because it has the side effect of modifying data past the end of the stack, and it can't know that that side effect doesn't later change program behaviour.

rversteegen commented 4 years ago

It would be impossible for the compiler to remove any pushes/pops even with LTO because it has the side effect of modifying data past the end of the stack, and it can't know that that side effect doesn't later change program behaviour.

Well... I mean in general. I guess LTO is theoretically capable of such an optimisation for a trivial testcase like this, but I think even that is probably far beyond any C++ compiler existing today.

frabert commented 4 years ago

@rversteegen You are right, of course. Should be fixed now!

I like the idea of implementing the old API via the new one, btw. I'll see what can be done