RosettaCommons / binder

Binder, tool for automatic generation of Python bindings
MIT License
317 stars 67 forks source link

Generated bindings are not portable between machines because they depend on internal STL-implementation-specific details #94

Closed adamnovak closed 4 years ago

adamnovak commented 4 years ago

I'm working with some Binder-generated bindings in the libbdsg project. Our planned workflow there is that, when developers update the external API of the library, they run Binder and regenerate the bindings, and then commit the bindings, so that our normal build process doesn't depend on an entire 2.6 GB llvm/clang copy.

However, the Binder-generated bindings, at least when Binder is run on one developer's machine, want to include bits/types/__mbstate_t.h, which is not a standard C++ header, but rather part of the particular STL implementation (libstdc++) in use on the machine where the bindings were generated. I believe this header exists in her standard library implementation, and Binder is just following transitive includes into the guts of STL, and parroting back the file names it found in the bindings, rather than restricting itself to requiring the headers that actually constitute the STL's interface.

This causes trouble when trying to build our project on another machine (a Mac, using libc++/clang instead of libstdc++/gcc). That file isn't provided by (my version of) libc++.

Is there a particular way of running binder that I need to use to restrict it to generating standard-compliant C++ which doesn't try and reach into the internal headers of libc++ or libstdc++?

Or do I need to manually maintain a blacklist of headers that Binder wants to include but shouldn't, via either -include or a postprocessing step on the generated source?

adamnovak commented 4 years ago

I'm also seeing references to std::__cxx11::basic_string<char> as an argument/return type, when the standard-compliant way to refer to the type would I think be std::string.

lyskov commented 4 years ago

@adamnovak this is a known issue. In general generated code might not be portable between different STL versions and highly unlikely to work with library from different STL vendor. So generating bindings on Linux (with GNU libstdc++) and trying to compile them on Mac (Clang libc++) is unlikely to work without heavy editing. Please note that this is expected behavior since STL implementation is not standardized so i do not think that we can provide general solution for this issue.

However there is a few things that could be done which in simple cases might do the trick:

I'm also seeing references to std::__cxx11::basic_string as an argument/return type, when the standard-compliant way to refer to the type would I think be std::string. -- this looks like something i can fix, - could you please see if you can generate minimal example that trigger such generation?

-- Also, to some extend this issue could be fixing by using -class directives in config files but this might require a lot of work so i recommend this only if relevant type/functions could not be mapped to standard headers.

adamnovak commented 4 years ago

I haven't actually been able to reproduce the pulling-in of nonstandard includes, after refactoring a bit how I'm running Binder.

I still don't have a minimal example for Binder preferring std::__cxx11::basic_string<char> to std::string, and -calss std::__cxx11::basic_string<char> in the config didn't help, but I applied s/class std::__cxx11::basic_string<char>/std::string/g and then s/std::__cxx11::basic_string<char>/std::string/g to the generated files with sed to work around it.

A new incompatibility I have found is that Binder is resolving C++ standard typedefs to their backing types. I have a typedef nid_t which is typedef'd to int64_t. When I generate bindings on a platform where int64_t is actually long (Linux GCC/libstdc++), I get bindings that use long where I have arguments and return types of type nid_t. When I bring them over to a platform where int64_t is actually long long (Mac Clang/libc++), they don't build.

Is there a workaround to make Binder stop at int64_t and not replace it with the "real" C type that the complier/STL use to implement it? I don't want to sed out all the longs from my bindings.

lyskov commented 4 years ago

re prevent mapping from typedef's to "real" types: this is indeed a problem. On LLVM level all types is always "real" one so the only way solve this would be to somehow find a way to map such types back to "source types" using typedef aliases. Issue however that could be a multiple typedef's exist for the same type (and usually does since most libraries have its own set of aliases). So i do not think there will be a general solution to this problem.

andriish commented 4 years ago

Hi @adamnovak ,

I still don't have a minimal example for Binder preferring std::__cxx11::basic_string to std::string

Virtually every Fedora in CI is doing that as of now. To fix this behavior and prefer the string one can set the GLIBCXX ABI flag explicitly, see #130.

Best regards, Andrii

lyskov commented 4 years ago

looks like this is now resolved, closing for now