Xingyu-Lin / softgym

SoftGym is a set of benchmark environments for deformable object manipulation.
BSD 3-Clause "New" or "Revised" License
270 stars 61 forks source link

Compiled PyFlex does not work on Ubuntu 20 #30

Open Skylion007 opened 2 years ago

Skylion007 commented 2 years ago

I have been trying to compile the latest version of SoftGym on a Ubuntu 20 machine, however, I have been unable to load the compile pyflex.so from either the system or a conda interperter. The error I keep getting is that the symbol __powf_finite is not defined which seems to be related to the libc version.

I have been using CUDA11.6, PyBind2.9.1 and Ubuntu 20. I have tested this issue on Python 3.9, 3.8, and 3.7 and it has caused the same issue on each. I tried compiling with clang, but I got several errors that prevented compilation altogether.

DanielTakeshi commented 2 years ago

Can you copy and paste your full error message so that we can better diagnose? Also, please exactly reproduce your steps.

Skylion007 commented 2 years ago

When trying to import pyflex:

ImportError: .....pyflex.so: undefined symbol: __powf_finite
Skylion007 commented 2 years ago

@DanielTakeshi Any updates?

FranBesq commented 2 years ago

Have you compiled PyFlex with docker? What steps have you followed exactly?

denkiwakame commented 2 years ago

@Skylion007

Hi, I encountered the same issue today w/PyFleX and figured out that the precompiled static library NvFlexExtReleaseCUDA uses __powf_finite function, which is not included in the latest libc++ https://github.com/google/filament/issues/2146#issuecomment-590101241

$ strings ../../lib/linux64/NvFlexExtReleaseCUDA_x64.a | grep finite
__powf_finite

Unfortunatelly we cannot easily re-compile NVIDIA FleX (proprietary software). I just tried the following workaround and it worked locally (outside docker).

float __powf_finite(float x, float y) { return powf(x, y); }
add_library(libc_compat ${ROOT}/bindings/libc_compat/libc_compat.c)
...
target_link_libraries(${EXAMPLE_BIN} PRIVATE ${ROOT}/lib/linux64/NvFlexExtReleaseCUDA_x64.a)
target_link_libraries(${EXAMPLE_BIN} PRIVATE libc_compat)
$ cmake -H. -Bbuild
$ make -j -C build

That is, I created the entity of __powf_finite by myself and linked so that NvFlexExtReleaseCUDA can refer to it. It should work. I hope this helps.

info

Skylion007 commented 2 years ago

@denkiwakame It would be really useful if we could detect this error by checking the libc version and automatically apply this fix. Would you be willing to look into opening a PR?

denkiwakame commented 2 years ago

@Skylion007

I don't mind creating a PR though, in my humble opinion, this is not "fix", but a "temporary workaround'' .

Btw, have you resolved the problem? Although I applied a simple workaround, it would also be appreciated if you find out a better solution for this :D

adla700 commented 3 weeks ago

@Skylion007

I don't mind creating a PR though, in my humble opinion, this is not "fix", but a "temporary workaround'' .

  • 1️⃣ I created the old libc-compatible dummy library for NvFleXExtReleaseCUDA, and the function just fallbacks to powf instead of the original __powf_finite .
  • 2️⃣ In my understanding, we can `fix'' the issue only if we re-compileNVIDIA FleXwithout-ffast-math` (which may cause a performance issue) https://bugzilla.redhat.com/show_bug.cgi?id=1803203

    • or, re-compile NVIDIA FleX with latest libc
    • .... , which are not possible for us since NVIDIA open-sources only their democodes https://github.com/NVIDIAGameWorks/FleX
    • The problem is not due to neither the SoftGym nor PyFleX, but the precompiled NVIDIA FleX which depends on the older libc and CUDA9.
  • 3️⃣ It seems that the original authors only support Ubuntu 16.04 or 18.04 (in docker). We should not extend supported platforms unless the maintainers are eager to do so, which will be a bit too much on their plate.

    • (side note) As long as I tested locally, we don't even need cuda-docker environments when compiling (all we need is libcudart9.1.a and statically link it to the python binding alongside with NvFleX).

Btw, have you resolved the problem? Although I applied a simple workaround, it would also be appreciated if you find out a better solution for this :D

Had the same problem while running rlvlmf today - ImportError: /home/adhula/PyFleX/bindings/build/pyflex.cpython-39-x86_64-linux-gnu.so: undefined symbol: __powf_finite

Did you find any simple/short cut solutions for it?