RosettaCommons / binder

Binder, tool for automatic generation of Python bindings
MIT License
315 stars 67 forks source link

Non-portable generated code for `size_t` #258

Open jlblancoc opened 1 year ago

jlblancoc commented 1 year ago

Problem

// C++ API to be wrapped:
size_t foo();

Generates code replacing size_t with unsigned long. Which is OK... until you try to build the generated sources in another platform (e.g. armhf, etc.) where size_t becomes something else, so the generated pattern code doesn't match anymore the original C++ API and fails to build. A solution might be re-running binder for each architecture, but I would like to avoid it.

Questions

If you give me some pointers, I would be glad to attempt to implement it and PR.

CStewart-Burger commented 1 year ago

I just came here for the same thing. I'm not particularly familiar with clang, but I think you'd want to look at https://github.com/RosettaCommons/binder/blob/67a9fe429698d752dc336122995f838bdeb7f92d/source/function.cpp#LL245C5-L245C5. I've also noticed some other argument types not being parsed as I'd expect. @jlblancoc , have you tested other stdint types?

jlblancoc commented 1 year ago

Thanks for the pointer! I'll try to craft something around that function...

, have you tested other stdint types?

sure, the same happen for examples for uint8_t which gets mapped into unsigned char, but that one is fairly standard and no problematic, but that's not always the case :-(

lyskov commented 1 year ago

@jlblancoc unfortunately in general this is expected behavior and right now i could not offer you any workarounds for solving it. For my main project i have to give on idea of generating binding on one platform and compiling it on the other due to this issue. In short: this issue is due to how LLVM/Clang inner handling of the types is implemented, after function is parsed it types reported in "resolved" form so all information about typedefs is lost.

It has been a while when there was last time i checked on this issue. I will take another look in case newer LLVM versions allows to extract original type info but I think chances of this are slim.

Questions

  • Within binder, where is the type size_t and such converted into their underlying types?

-- this conversion is happened inside LLVM/Clang layer so this is not something in our control.

  • What would you think about a new config option to "freeze" certain types such that they are not resolved into their underlying types (if this is even possible...)

-- similar ideas was considered in the past. Problem with this approach is that we have no way to determine which type instances of unsigned long should be converted to size_t and which should not.

lyskov commented 1 year ago

@jlblancoc , i just took a fresh look at the code and new API docs and i think there might be a way to get original type information! Let me play with this and i will get back to you.

jlblancoc commented 1 year ago

Thanks, it would be great! In the meanwhile I tried with tricks redefining __SIZE_TYPE__ and alike, but that's a dead end...

lyskov commented 1 year ago

@jlblancoc - it turned out that general solution for this issue might require a bit more research. So for now i found a decent workaround that we can use but it will require listing all std:: types that we confident could be back-converted. Right now i added only std::size_t but it should be possible to add correct handling for any non-template types. So could you please try to pull from recent master and see if this works for you? And if/when you discover similar error try adding appropriate std type into this list: https://github.com/RosettaCommons/binder/commit/30e53cc2bceb51ad3260bdf4f6d97ecd5d808b5a#diff-098881659b3bdd75b9b57555bb70247c118325422929347527db13c1e67bd0faR471 - PR with updated list will be welcome 😬 - as i mentioned this will currently only work for non template types. Thanks,

jlblancoc commented 1 year ago

Sergey, this is definitively awesome, thanks! 👏👏

Sure, I'll try it and expand in a PR after testing...

Cheers

jlblancoc commented 1 year ago

PS: it would add a certain performance loss, but if you prefer it, I could also add an optional set of types loadable from the config file, in addition to the standard std types I will add.

lyskov commented 1 year ago

PS: it would add a certain performance loss, but if you prefer it, I could also add an optional set of types loadable from the config file, in addition to the standard std types I will add.

-- that would be great, PR adding this functionality will be welcome!

Re possible performance hit: - i would not worry about this. We can make sure that standard_names set is only initialized from options once. For instance the most straightforward way to achieve this would be to refactor line https://github.com/RosettaCommons/binder/commit/30e53cc2bceb51ad3260bdf4f6d97ecd5d808b5a#diff-098881659b3bdd75b9b57555bb70247c118325422929347527db13c1e67bd0faR471 to something like this:

static std::set<string> standard_names;
if( standard_names.empty() ) {
  standard_names.insert( {"std::size_t"} );
  <add names from config options here>
}
jlblancoc commented 1 year ago

@lyskov After your fix, all cl.def(..) cases are fixed regarding size_t and friends, that's all good.

But now, the same problem remains in the auto-generated wrappers PyCallBack_*, see for example this where "size_t" has been converted to "unsigned long", even after the latest commits.

I'll try to take a look at binder code to see where those wrappers are generated, but you would probably have it already in your head and provide a faster fix (?).

jlblancoc commented 1 year ago

Maybe here?

https://github.com/RosettaCommons/binder/blob/master/source/class.cpp#L701-L720

lyskov commented 1 year ago

-- right! Binding code generation for member functions will needs to be updated as will. - I will looks this up soon!

lyskov commented 1 year ago

OK, i just pushed the patch, - @jlblancoc please see if it works for you! Thanks,

jlblancoc commented 1 year ago

Thanks for the quick fix @lyskov !

I tested it and indeed it seems to have worked well. I'm waiting for packages to be built in armhf to fully confirm it works now...

jlblancoc commented 1 year ago

My package still have one remaining error for armhf but it's a too particular problem in my API... binder made the grade here too, thanks!! :100:

jlblancoc commented 1 year ago

I found one more edge case:

    void push_back(const double v) { internalPushBack(v); }
    void push_back(const std::string v) { internalPushBack(v); }
    void push_back(const uint64_t v) { internalPushBack(v); }

got converted into:

        cl.def("push_back", (void (mrpt::containers::yaml::*)(const double)) &mrpt::containers::yaml::push_back, "Append a new value to a sequence.\n \n\n std::exception If this is not a sequence \n\nC++: mrpt::containers::yaml::push_back(const double) --> void", pybind11::arg("v"));
        cl.def("push_back", (void (mrpt::containers::yaml::*)(const std::string)) &mrpt::containers::yaml::push_back, "C++: mrpt::containers::yaml::push_back(const std::string) --> void", pybind11::arg("v"));
        cl.def("push_back", (void (mrpt::containers::yaml::*)(const unsigned long)) &mrpt::containers::yaml::push_back, "C++: mrpt::containers::yaml::push_back(const unsigned long) --> void", pybind11::arg("v"));

Note the replacement of uint64_t to unsigned long, as it was before... Removing the "const" fixes it:

    void push_back(uint64_t v) { internalPushBack(v); }

gives the correct uint64_t:

+               cl.def("push_back", (void (mrpt::containers::yaml::*)(uint64_t)) &mrpt::containers::yaml::push_back, "C++: mrpt::containers::yaml::push_back(uint64_t) --> void", pybind11::arg("v"));

Maybe related to string matching in types?

For now I'll drop the "const" on my side, but it's not expected ideal behavior...

lyskov commented 1 year ago

Thank you for detailed examples @jlblancoc ! Let me think about this for a bit. Clearly more general solution is needed here. I have a few ideas to try and experiment and will post here if/when i have something better then the current code.

jlblancoc commented 1 year ago

For the records, I "solved" this in my project via a mass generation of wrapping code via binder, then applying hand-written patches to the problematic parts. Not ideal, but it's really hard to handle all these usages of variable-size types in a portable way.

PS: Link to relevant "hacky" parts in my project, I hope it can help others: https://github.com/MRPT/mrpt/blob/develop/python/generate-python.sh#L69-L97