NVIDIA / GMAT

A toolkit showing GPU's all-round capability in video processing
Other
177 stars 40 forks source link

Linking to ONNXRUNTIME Issue #2

Closed hmdtb closed 2 years ago

hmdtb commented 2 years ago

Hi,

In the very last steps of the build process, I face an issue that seems that the onnxruntime library is not linked properly and I get the following message: LD ffmpeg_g /usr/bin/ld: libavfilter/libavfilter.so: undefined reference to OrtGetApiBase collect2: error: ld returned 1 exit status make: *** [Makefile:124: ffmpeg_g] Error 1

Any suggestion? Thanks!

xiaoweiw-nv commented 2 years ago

Sorry for the late reply, I was on vacation.

Can you show the output of: ldconfig -p | grep onnx

and: ldd libavfilter/libavfilter.so

Thanks

hmdtb commented 2 years ago

Thank for the reply. I have solved that issue, however, there is another issue which I couldn't solve yet.

/usr/bin/ld: libavfilter/libavfilter.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/usr/bin/ld: libavfilter/libavfilter.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char collect2: error: ld returned 1 exit status
const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
collect2: error: ld returned 1 exit status
make: *** [Makefile:124: ffprobe_g] Error 1
make: *** Waiting for unfinished jobs....
make: *** [Makefile:124: ffmpeg_g] Error 1

Do you have any idea why this is happening? I have rebuilt the torchvision with different versions and adding -D_GLIBCXX_USE_CXX11_ABI=0 as suggested as a solution, however, it doesn't help here.

xiaoweiw-nv commented 2 years ago

Is your libtorch built with -D_GLIBCXX_USE_CXX11_ABI=0?

hmdtb commented 2 years ago

Yes, I have built it as you suggested. It also shows that the flag is not set (as I have configured it during the built)

>>> import torch
>>> print(torch._C._GLIBCXX_USE_CXX11_ABI)
False
xiaoweiw-nv commented 2 years ago

Yes, I have built it as you suggested. It also shows that the flag is not set (as I have configured it during the built)

>>> import torch
>>> print(torch._C._GLIBCXX_USE_CXX11_ABI)
False

My suggestion in the readme is use libtorch/torchvision with cxx11 ABI, which means using libtorch/torchvision built with -D_GLIBCXX_USE_CXX11_ABI=1, that's why I didn't add this flag to the config script. Sorry for the misunderstanding. If you do not want to use cxx11 ABI, you will also need to pass -D_GLIBCXX_USE_CXX11_ABI=0 to ffmpeg during configuration.

hmdtb commented 2 years ago

It will very much help if you can provide me with a docker file so that it builds all dependencies and the libraries you specified. It can be based on a generic x86 CUDA 11+ desktop configuration.

xiaoweiw-nv commented 2 years ago

It will very much help if you can provide me with a docker file so that it builds all dependencies and the libraries you specified. It can be based on a generic x86 CUDA 11+ desktop configuration.

Thanks for your suggestion. We are preparing a docker image which contains all the dependencies. It will be ready this week.

xiaoweiw-nv commented 2 years ago

The Dockerfile has been uploaded to the repo, can you give it a try? docker build instructions are provided in the new README and documentations.

hmdtb commented 2 years ago

Thanks for the Docker file.

It seems that one dependency is still missing since I get this error:

MAN doc/ffmpeg-filters.1
MAN doc/libavutil.3
MAN doc/libswscale.3
MAN doc/libswresample.3
MAN doc/libavcodec.3
MAN doc/libavformat.3
MAN doc/libavdevice.3
MAN doc/libavfilter.3
LD  libswscale/libswscale.so.5
LD  libswresample/libswresample.so.3
STRIP   libavcodec/x86/vp9itxfm.o
GEN libavcodec/libavcodec.ver
LD  libavcodec/libavcodec.so.58
LD  libavformat/libavformat.so.58
LD  libavfilter/libavfilter.so.7
gcc: error: libavfilter/cnpy.o: No such file or directory
make: *** [ffbuild/library.mak:103: libavfilter/libavfilter.so.7] Error 1

PS: On line 31 of the Dockerfile there is "&&" missing.

xiaoweiw-nv commented 2 years ago

Thanks for the heads up. Can you try git submodule update --init Looks like the cnpy submodule is missing.

hmdtb commented 2 years ago

Thanks. The build was successful.

xiaoweiw-nv commented 2 years ago

Hi @hmdtb ,

We found that onnxruntime 1.11 may cause segfault, so we changed onnxruntime back to 1.8.1, the Dockerfile has been updated. You can download the onnxruntime 1.8.1 and manually install it if you do not want to rebuild the docker image.

If you do not have further questions, I will close this issue. Thanks!

hmdtb commented 2 years ago

Thanks for informing me regarding this update.

I will rebuild the image and in case of any new problem, I will open a new issue ticket.

xiaoweiw-nv commented 2 years ago

Thanks. Closing the issue

Thanks for informing me regarding this update.

I will rebuild the image and in case of any new problem, I will open a new issue ticket.