AcademySoftwareFoundation / MaterialX

MaterialX is an open standard for the exchange of rich material and look-development content across applications and renderers.
http://www.materialx.org/
Apache License 2.0
1.81k stars 331 forks source link

libMaterialXFormat exports pugixml symbols #1941

Closed DarkDefender closed 3 weeks ago

DarkDefender commented 1 month ago

We had some weird pugixml issues in Blender and we manged to finally figure out that the MaterialX libMaterialXFormat lib is the issue: https://projects.blender.org/blender/blender/issues/124173

It exports pugixml symbols and thus will lead to issues if something else links to or uses an other pugixml version than what is bundled in MaterialX. (Blender links to pugixml as we use it in Blender itself)

I'm guessing exporting these symbols are not on purpose?

A quick and dirty fix would be to add #define PUGIXML_API __attribute__((visibility("hidden"))) in pugiconfig.hpp. Perhaps it is better to not bundle pugixml like this and have it be a regular library dependency of MaterialX? (But I'm guessing it was done this way to have as few external library deps as possible)

ld-kerley commented 1 month ago

Looking a little closer - it looks like we actually install the pugixml headers, unnecessarily as well.

The version of pugixml being used is also fairly old.

If we didn't bundle it then it would allow people to use a more modern version, but at the disadvantage of introducing a required dependency. Perhaps we can have the best of both worlds if we make it an external dependency that can be found via find_package() in cmake, but also provide helper scripts in the project to download and build it if its not provided. I believe that some of the other ASWF projects follow this pattern (OIIO?), we may be able to borrow some inspiration from there.

jstone-lucasfilm commented 1 month ago

@DarkDefender @ld-kerley In additional to our wanting to include lightweight dependencies in MaterialX for simplicity, there's an additional caveat to our PugiXML integration, which is that it's customized to handle features that are not natively supported by the off-the-shelf library:

Two examples may be found here: https://github.com/AcademySoftwareFoundation/MaterialX/blob/main/source/MaterialXFormat/External/PugiXML/pugixml.cpp#L3422 https://github.com/AcademySoftwareFoundation/MaterialX/blob/main/source/MaterialXFormat/External/PugiXML/pugixml.cpp#L3949

We're definitely open to ideas for improving this situation, but in the short term it means that we'll likely need to maintain PugiXML as internal source files within the project.

If there's a straightforward way to remove PugiXML symbols from our binaries, that sounds like a worthwhile change to make, as the only usage of this library should be internal to MaterialXFormat.

ld-kerley commented 1 month ago

@jstone-lucasfilm thats interesting, and something I wasn't aware of. I guess that means that "technically" the MaterialX format is not XML in the strictest sense, but instead XML-based? This is something that the Bevy folks, or anyone else implementing their own MaterialX parser would need to know. Is this difference documented anywhere in the Specification document?

I'm guessing the < and > handling was added to support things like <UDIM> tokens in filenames?

Curious what the new-line handling was added for? is that just to maintain white-space formatting when round-tripping through PugiXML?

I think there's a pretty straight forward cmake patch to resolve the issue reported here - I'll try and throw something together.

kwokcb commented 1 month ago

Hi @ld-kerley, Yes, new-line was added to preserve spacing for easier compares.

  1. It allows for instance the std libraries comment docs to be upgraded without format change. Since these are actual nodedef docs I suggest make these actual attributes on the element which can be interchanged with other formats like JSON / or just to extract documentation.
  2. The second case I know of is folks doing actual string compares to see if a shader need to be recompiled -- kinda super ugly but was another reason for preserving spacing.

As far as I recall, yes, < and > is for allow these token separators in attribute strings like UDIMs. I'm not sure why they get converted in the first place -- I guess for DOM parsing you can't keep these but need to use &lt and &gt.

LazyDodo commented 1 month ago

If there's a straightforward way to remove PugiXML symbols from our binaries, that sounds like a worthwhile change to make, as the only usage of this library should be internal to MaterialXFormat.

I admit, i'm not super familiar with the internals of materialX, but building with -fvisibility=hidden on linux is possibly an option, as hiding symbols unless explicitly asked to export is the default (and only) behaviour for the MSVC linker. Most symbols that need to be exported for MaterialX are likely already marked just to make MSVC happy.

If that's for some reason doesn't work out #define PUGIXML_API __attribute__((visibility("hidden"))) will certainly do the trick but needs a compiler guard as compilers that aren't GCC or Clang (cough msvc cough) will not recognize this attribute (or any attribute really)

jstone-lucasfilm commented 1 month ago

@ld-kerley @kwokcb Just following up on the customization of PugiXML, our early experience with MaterialX was that popular XML import libraries disagreed on whether quoted angle brackets should be allowed. I believe the ElementTree package in Python allows this usage, while PugiXML disallows it, so we were stuck with the need to customize when we adopted PugiXML in MaterialX C++.

Given the lack of universal support for this feature, it may make sense to switch over to &lt and &gt in a future version of MaterialX, where the usage of quoted angle brackets in earlier documents would be automatically upgraded.

ld-kerley commented 1 month ago

I think this would probably be a good change at some point - I think ultimately being able to rely on any standard XML library for parsing, and being able to say the file format is an "honest" XML file would be a good step.

I also like @kwokcb suggestion of moving the documentation of the node definitions concretely in to the actual node definition, as well as eliding one of the new-line requirements it would also allow for DCC integrations access to standardized documentation of each node.

jstone-lucasfilm commented 3 weeks ago

Thanks for this report, @DarkDefender, and thanks to @ld-kerley for the fix in #1944!