PixarAnimationStudios / OpenUSD

Universal Scene Description
http://www.openusd.org
Other
6.12k stars 1.22k forks source link

Missing visibility macros on core classes cause runtime TfType equality issues #1475

Open jerry-huxtable-foundry opened 3 years ago

jerry-huxtable-foundry commented 3 years ago

Description of Issue

We've identified a problem with USD classes which is causing us much distress on macOS with clang. When trying to set attributes on prims we get Coding Error messages like this one:

Type mismatch for </TestMesh1.points>: expected 'VtArray<GfVec3f>', got 'VtArray<GfVec3f>' After a lot of investigation we've determined the cause of this. The problem is USD is comparing the pointers of type_infos (in tf/safeTypeCompare.h), and we have two type_infos for this type, one in our library and one in USD. The underlying cause is that many of the classes in USD, such as GfVec3f aren't marked as having default visibility so don't get exported. This isn't a problem most of the time because nearly all of the code is inlined and a few methods are marked as being exported. However, this means that the type_info for GfVec3f doesn't get exported properly. It does get exported as a weak-external symbol, which is good, but the problems come when you then try to use GfVec3f from another dynamic library.

If you build a dylib that uses GfVec3f and build it with -fvisibility=default, everything is fine because the linker generates the type_info as weak-external. At runtime, the linker will merge all the weak-external type_infos into one symbol and everything works OK. If, however, you build your library -fvisibility=hidden (as we do to avoid namespace pollution and for performance reasons), the type_info gets marked as non-external and doesn't get merged at runtime. You now have two type_infos for the same type and USD says they're different. This problem also transfers itself to templates such as VtArray.

The fix is to add the GF_API macro to the declaration of the GfVec3f class. Of course, GfVec3f is only one of many classes that suffer from this problem (TfToken is another which has caused us pain), and the appropriate export macro should be added to all of them.

Steps to Reproduce

  1. Create a source file that uses GfVec3f
  2. Build a dynamic library, test.dylib with it using -fvisibility=hidden
  3. Examine the resulting symbols using nm:
$ nm -mo test.dylib| c++filt | grep typeinfo | grep GfVec3f
test.dylib: 0000000000020800 (__DATA,__const) non-external (was signed char private external) typeinfo for pxrInternal_v0_21__pxrReserved__::GfVec3f

The output shows the problem: That type_info should be "weak external", not "non-external" and the library has its own private copy of the type_info, which doesn't equal the one in libgf.

Build it again with -fvisibility=default and you'll get this:
$ nm -mo test.dylib| c++filt | grep typeinfo | grep GfVec3f
test.dylib: 0000000000022158 (__DATA,__const) weak external typeinfo for pxrInternal_v0_21__pxrReserved__::GfVec3f

This is correct and makes the problem go away.

System Information (OS, Hardware)

macOS 14.3 clang 11.0

Package Versions

All

Build Flags

Default

jilliene commented 3 years ago

Filed as internal issue #USD-6608

c64kernal commented 3 years ago

Thanks for filing @jerry-huxtable-foundry -- unfortunately this is a known issue (see for example https://github.com/PixarAnimationStudios/USD/issues/665) We don't currently explicitly export vtable and type_info information from classes generally... we're talking about possible solutions internally, but unfortunately we can't say when we can have this addressed.

jerry-huxtable-foundry commented 3 years ago

I did come across https://github.com/PixarAnimationStudios/USD/issues/665, but thought it worth filing a new issue as that mostly dealt with link-time problems whereas we're experiencing runtime issues where type conversions randomly fail - there's no sign that anything has gone wrong at build time.

moshev commented 2 years ago

If it helps anyone, I have a workaround for this in pr #1668 , which we currently use at Chaos. Since it doesn't change the ABI, you can even use it if you have to use objects created by a foreign DSO, e.g. accessing USD objects created by another plug-in in a DCC.

UsdBrickRenderTransformer commented 7 months ago

Hi guys, I recently stumbled on exactly the same problem. After I was pointed to this existing issue here, I verified, that I was using the -fvisibility=hidden flag. So for now I probably will compile without that flag and if that fails I will do a small patch and insert either appropriate attributes for the relevant classes or just do moshev's suggestion. Although I have a little fear of a possible performance hit with it.

Here is my initial post in the forum regarding this:

Hi Guys, I am setting up a small POC program, which uses a custom Scene delegate with the Vulkan HGI on OpenUSD version 24.03. I am linking OpenUSD dynamically. I did the hydra initialisation similar to testHdxCameraAndLight.cpp So I created the HGI, constructed drivers from it and gave it into the constructor of RenderIndex:

HgiUniquePtr hgi = Hgi::CreatePlatformDefaultHgi(); HdDriver driver{HgiTokens->renderDriver, VtValue(hgi.get())};

HdStRenderDelegate renderDelegate;
std::unique_ptr<HdRenderIndex> index(HdRenderIndex::New(&renderDelegate, {&driver})); 

During construction it crashes, because the driver VtValue is checked if it is holding an Hgi. This check compares the std::type_info (T1) stored in the VtValue (which is typeid(Hgi)) with the std::type_info of Hgi (T2) in HdStRenderDelegate::SetDrivers() (hdDriver->driver.IsHolding<Hgi>, which does the comparison with TfSafeTypeCompare, which compares the two std::type_infos).

This check fails and so the RenderDelegates Hgi* is left a nullptr which is making the program crash later.

I read into the typeid handling and std::type_info comparison. It seems, that it is not safe to compare two std::type_infos across library borders. The issue could be, that T1 was gained at compile time of the main program and T2 was gained at compile time of the appropriate dynamic OpenUSD library. From what I understood the std::type_info is not only dependent on the type, but also on the compiler and the used compile settings. I checked the type names of both std::type_info and they are equal. The == operator returns false still. The names are equal but the hash is different, which hints, that they are seen as different types while comparing them.

So the big question is: Is it a valid approach at all to do hydra initialisation that way or do I need to create the Hgi and the driver indirect through a compatible dynlib? Do I need some specific compiler settings and compiler versions? Is it mandatory to compile the main program, which is using the OpenUSD libraries with the same relevant compiler switches and what switches are that exactly?

For now I am really stuck. I don’t want to debug through the disassembly of the type_info comparison.

I am using a MacBook 2 on arm architecture with clang compiler. I am compiling OpenUSD locally to be able to debug through it. I also got the problem with the default HGI (Vulkan disabled).

Any help would be appreciated :slight_smile:

Cheers, Robert