ThePhD / sol2

Sol3 (sol2 v3.0) - a C++ <-> Lua API wrapper with advanced features and top notch performance - is here, and it's great! Documentation:
http://sol2.rtfd.io/
MIT License
4.06k stars 492 forks source link

Incompatibility of pointers to base and derived classes in Lua function calls #1532

Open congard opened 9 months ago

congard commented 9 months ago

Note: it is not a duplicate of #700

sol version: 3.3.0 Lua version: 5.4.4 Compiler: Fedora 38 clang 16.0.0 / Windows 11 clang 16.0.0, target: x86_64-pc-windows-msvc Complete test case: https://github.com/ThePhD/sol2/issues/1532#issuecomment-1738995821

I've bumped into quite strange sol2 behavior, that looks quite similar to #700 except one important thing - I have explicitly provided base classes. So, here is some of my code:

Base class binding:

template<> void registerLuaUsertype<algine::Object>(sol::table &table, void *userdata) {
    if (table["Object"].valid())
        return;

    auto ctors = sol::constructors<algine::Object(algine::Object *)>();
    auto usertype = table.new_usertype<algine::Object>(
        "Object", 
        sol::meta_function::construct, ctors,
        sol::call_constructor, ctors,
        sol::base_classes, sol::bases<>());

    ...

Derived class binding:

template<> void registerLuaUsertype<algine::Widget>(sol::table &table, void *userdata) {
    if (table["Widget"].valid())
        return;

    auto ctors = sol::no_constructor;
    auto usertype = table.new_usertype<algine::Widget>(
        "Widget", 
        sol::meta_function::construct, ctors,
        sol::call_constructor, ctors,
        sol::base_classes, sol::bases<algine::Object>());

    ...

Then, I create a new environment:

template<bool b>
sol::environment createEnvironment(const sol::basic_reference<b> &parent) {
    return {m_state, sol::create, parent};
}

Finally, I register the aforementioned types like this:

env["load_types"] = [](sol::this_environment tenv) {
    sol::table t = *tenv.env;
    registerLuaUsertype<algine::Object>(t, nullptr);
    registerLuaUsertype<algine::Widget>(t, nullptr);
};

My Lua code looks like:

load_types()

-- obj is of type 'algine::Widget*'
function funcExample(obj)
    doSomethingWithObject(obj)  -- error here
end

doSomethingWithObject implementation:

env["doSomethingWithObject"] = [](Object *object) {
    // stub
};

And I get the following error:

stack index 1, expected userdata, received sol.algine::Widget *: value at this index does not properly reflect the desired type (bad argument into 'void(algine::Object *)')
stack traceback:
        [C]: in function 'doSomethingWithObject'

What's strange, to make it work as expected I just need to create a new usertype that inherits algine::Object:

class Foo: public Object {};
auto foo = env.new_usertype<Foo>("Foo", sol::base_classes, sol::bases<Object>());

env["doSomethingWithObject"] = [](Object *object) {

};

So, my first question is: what can be wrong with the snippets above? What's the possible problem with them? What potential problems do you see? And my second question: is it safe to use sol::table instead of sol::environment in the bindings code?

congard commented 9 months ago

So, I've created a test case that reproduces this issue demonstrates UB of #1523. It seems that we are dealing with undefined behavior here - the test case produces slightly different output on each run.

Short test case description:

  1. Create 2 classes: base (Foo) and derived (Bar)
  2. Create n environments with the global environment as the parent
  3. For each environment:
    1. Register types Foo and Bar for the current environment
    2. Bind a function that takes base class (Foo) as an argument
    3. Create a new Lua function func that takes an argument arg and calls the aforementioned function with the argument arg
  4. For each environment:
    1. Call function func with a pointer to Bar instance as an argument

Code:

#include <iostream>
#include <sol/sol.hpp>

//#define VERBOSE

#ifdef VERBOSE
#define LOG_V(...) printf(__VA_ARGS__)
#else
#define LOG_V(...)
#endif

// base class
class Foo {
public:
    Foo() {
        LOG_V("ctor\n");
    }

    ~Foo() {
        LOG_V("dtor\n");
    }
};

// derived class
class Bar: public Foo {};

template<typename T> static void registerType(sol::table &table);

template<> void registerType<Foo>(sol::table &table) {
    if (table["Foo"].valid())
        return;
    auto foo = table.new_usertype<Foo>("Foo");
    LOG_V("Foo registered\n");
}

template<> void registerType<Bar>(sol::table &table) {
    if (table["Bar"].valid())
        return;
    auto bar = table.new_usertype<Bar>("Bar", sol::base_classes, sol::bases<Foo>());
    LOG_V("Bar registered\n");
}

static bool testEnvCount(int envCount) {
    sol::state state;

    std::vector<sol::environment> envs(envCount);

    // create environments
    for (auto &env : envs)
        env = sol::environment(state, sol::create, state.globals());

    auto cfgEnv = [&state](sol::environment &env) {
        registerType<Bar>(env);
        registerType<Foo>(env);

        state.script("function func(foo) doSomethingWithFoo(foo) end", env);

        env["doSomethingWithFoo"] = [](Foo *foo) {
            LOG_V("[doSomethingWithFoo] Foo: %ti\n", (ptrdiff_t) foo);
        };
    };

    // configure environments
    for (auto &env : envs)
        cfgEnv(env);

    Bar bar;

    auto callFunc = [bar_ptr = &bar](sol::environment &env) {
        auto result = env["func"].get<sol::function>()(bar_ptr);

        if (!result.valid()) {
            sol::error error = result;
            LOG_V("%s\n", error.what());
        }

        return result.valid();
    };

    // call Lua function "func"
    for (auto &env : envs) {
        if (!callFunc(env)) {
            return false;
        }
    }

    return true;
}

int main() {
    constexpr int testFrom = 1;
    constexpr int testTo = 30;

    for (int i = testFrom; i <= testTo; ++i) {
        if (testEnvCount(i)) {
            std::cout << "OK for envCount = " << i << "\n";
        } else {
            std::cout << "Failed for envCount = " << i << "\n";
        }
    }

    return 0;
}

Output:

1st run ``` OK for envCount = 1 Failed for envCount = 2 Failed for envCount = 3 OK for envCount = 4 Failed for envCount = 5 Failed for envCount = 6 OK for envCount = 7 Failed for envCount = 8 OK for envCount = 9 Failed for envCount = 10 OK for envCount = 11 Failed for envCount = 12 OK for envCount = 13 Failed for envCount = 14 Failed for envCount = 15 OK for envCount = 16 OK for envCount = 17 Failed for envCount = 18 OK for envCount = 19 OK for envCount = 20 OK for envCount = 21 OK for envCount = 22 Failed for envCount = 23 OK for envCount = 24 OK for envCount = 25 OK for envCount = 26 Failed for envCount = 27 OK for envCount = 28 OK for envCount = 29 Failed for envCount = 30 ```
2nd run ``` OK for envCount = 1 Failed for envCount = 2 Failed for envCount = 3 OK for envCount = 4 Failed for envCount = 5 Failed for envCount = 6 OK for envCount = 7 Failed for envCount = 8 OK for envCount = 9 Failed for envCount = 10 Failed for envCount = 11 <-- difference here, for example Failed for envCount = 12 OK for envCount = 13 Failed for envCount = 14 Failed for envCount = 15 OK for envCount = 16 OK for envCount = 17 Failed for envCount = 18 OK for envCount = 19 OK for envCount = 20 OK for envCount = 21 OK for envCount = 22 Failed for envCount = 23 OK for envCount = 24 OK for envCount = 25 OK for envCount = 26 Failed for envCount = 27 OK for envCount = 28 OK for envCount = 29 Failed for envCount = 30 ```

Sol error message:

stack index 1, expected userdata, received sol.Bar *: value at this index does not properly reflect the desired type (ba
d argument into 'void(Foo *)')
stack traceback:
        [C]: in global 'doSomethingWithFoo'
        [string "function func(foo) doSomethingWithFoo(foo) en..."]:1: in function <[string "function func(foo) doSometh
ingWithFoo(foo) en..."]:1>

In the example above, switching to v3.2.3 helps, as it described in #1523. However, unfortunately, in my project it didn't help. The only available "workaround" is the one described at the end of the first comment.

congard commented 9 months ago

Some additional debug info from my project (tested with sol v3.2.2):

algine::Object registered
algine::Scene (extends public algine::Object) registered
stack_check_unqualified.hpp:520: has_derived == false  <------ error happens because of this
stack_check_unqualified.hpp:520: detail::demangle<T>() == "void) [T = algine::Object]"
stack_check_unqualified.hpp:536: success == false => calling `handler(L, index, type::userdata, indextype, "value at this index does not properly reflect the desired type")`

Error message:

stack index 1, expected userdata, received sol.void) [T = algine::Scene *]: value at this index does not properly reflect the desired type (bad argument into 'void) [T = void](void) [T = algine::Object *])')
stack traceback:
        [C]: in function 'func'
        [string "loadType('algine::Object') loadType('algine::..."]:1: in function 'f'

Code:

Lua lua;
auto &state = *lua.state();
sol::environment env = state.globals();
env["func"] = [](Object *obj) { printf("Global state\n"); };
state.script("loadType('algine::Object') loadType('algine::Scene') function f(o) func(o) end", env);
if (auto res = env["f"].get<sol::function>()((algine::Scene*) this); !res.valid()) {
    sol::error err = res;
    printf("%s\n", err.what());
}

Hope it will help to fix this issue

congard commented 9 months ago

Complete test case that reproduces the original issue

So, I continued my research regarding the original issue - and finally have found out all circumstances for the original issue aka "strange problem" to occur.

Final results

  1. It doesn't matter if any environments are created
  2. The issue appears when sol::state is being used in the external library
  3. The issue occurs when the library is built as shared, but only on Windows
  4. It seems that The Original Issue is unrelated to #1523
  5. Tested using both sol v3.2.3 and v3.3.0

Demo

Demo and more detailed results can be found here: https://github.com/congard/sol2-issue-1532