Closed lferraz closed 3 years ago
Hi @lferraz
Before we start digging too deep - have you rebuilt the VPF after the PyTorch update and did you check that latest PyTorch is used during the PytorchNvCodec
target build?
Hi @rarzumanyan ,
I am not using PytorchNvCodec
I use the Import/export methods. And, to generate the error I do not need to try to import/export anything. Only creating a nvc.PyNvDecoder
object. :S
I recompiled the VPF inside the env I have for my project. In the cmake
report everything make sense to me.
The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /code/optima-workspace/vision/.dev_env/envs/vision/bin/x86_64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 11.3.58
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Searching for FFmpeg libs in /lib
-- Searching for FFmpeg headers in /include
-- Searching for Video Codec SDK headers in /code/Video_Codec_SDK_11.0.10/include folder
-- Searching for Video Codec SDK headers in /code/Video_Codec_SDK_11.0.10/Interface folder
-- Found PythonLibs:/code/optima-workspace/vision/.dev_env/envs/vision/lib/libpython3.7m.so (found suitable version "3.7.2", minimum required is "3.5")
-- Found PythonInterp: /code/optima-workspace/vision/.dev_env/envs/vision/bin/python3.7 (found version "3.7.2")
-- Found PythonLibs: /code/optima-workspace/vision/.dev_env/envs/vision/lib/libpython3.7m.so
-- pybind11 v2.3.dev0
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /code/VideoProcessingFramework/build
Hi @rarzumanyan After 2 days isolating the problem I got a minimal example:
First create a conda env with the minimum required packages (you can try with other python versions and also other pytorch version if they are > 1.6). Note: with this line cudatoolkit 10.2 is installed (installing with pip pytorch does not install the cudatoolkit but the error also appears).
conda create -n myenv python=3.7.2 pytorch=1.8.1 ipython
Run this script in e.g. ipython
. You need to put valid paths for VPF_PATH and VIDEOPATH. I tested in several videos and always fail, e.g. the vp9 video we were using.
import torch
import sys
sys.path.append(VPF_PATH)
import PyNvCodec as nvc
if name == 'main': m = torch.eye(1, device=torch.device('cuda:0')) h = torch.stack([m, m, m])
vpf = nvc.PyNvDecoder('VIDEOPATH.mp4', 0)
a = m.inverse()
c = h.inverse()
b = m.inverse()
The error provided is:
`RuntimeError: cusolver error: 7, when calling `cusolverDnSgetrs( handle, CUBLAS_OP_N, n, nrhs, dA, lda, ipiv, ret, ldb, info)``
If you comment the vpf line there is no error.
I tested this script in 2 very similar machines at GCP with a V100 gpu.
Let me add more info regarding the compilation of VPF:
inside the env install cmake
:
conda install cmake
compile vpf using:
export PATH_TO_SDK=PWD/Video_Codec_SDK_9.1.23 (also tested with 11.0.10 in a machine with cuda 11.3)
export PATH_TO_FFMPEG=/usr/lib/x86_64-linux-gnu/.
export CUDACXX=/usr/local/cuda/bin/nvcc
without compiling the pytorch lib.
CMAKE output:
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 11.3.58
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Searching for FFmpeg libs in /usr/lib/x86_64-linux-gnu/./lib
-- Searching for FFmpeg headers in /usr/lib/x86_64-linux-gnu/./include
-- Searching for Video Codec SDK headers in /home/luisferrazcolomina/code/Video_Codec_SDK_9.1.23/include folder
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.7m.so (found suitable version "3.7.5", minimum required is "3.5")
-- Found PythonInterp: /home/luisferrazcolomina/code/optima-workspace/vision/.dev_env/envs/myenvX/bin/python3.7 (found version "3.7.2")
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.7m.so
-- pybind11 v2.3.dev0
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/luisferrazcolomina/code/VideoProcessingFramework/build
After the CMAKE I get:
PyNvCodec.cpython-37m-x86_64-linux-gnu.so
Hi @rarzumanyan After 2 days isolating the problem I got a minimal example:
* First create a conda env with the minimum required packages (you can try with other python versions and also other pytorch version if they are > 1.6). Note: with this line cudatoolkit 10.2 is installed (installing with pip pytorch does not install the cudatoolkit but the error also appears). `conda create -n myenv python=3.7.2 pytorch=1.8.1 ipython` * Run this script in e.g. `ipython`. You need to put valid paths for VPF_PATH and VIDEOPATH. I tested in several videos and always fail, e.g. the vp9 video we were using.
import torch import sys sys.path.append(VPF_PATH) import PyNvCodec as nvc if __name__ == '__main__': m = torch.eye(1, device=torch.device('cuda:0')) h = torch.stack([m, m, m]) vpf = nvc.PyNvDecoder('VIDEOPATH.mp4', 0) a = m.inverse() c = h.inverse() b = m.inverse()
The error provided is:
RuntimeError: cusolver error: 7, when calling
cusolverDnSgetrs( handle, CUBLAS_OP_N, n, nrhs, dA, lda, ipiv, ret, ldb, info)``If you comment the vpf line there is no error.
I tested this script in 2 very similar machines at GCP with a V100 gpu.
Hi @lferraz
Unfortunately I can't use Anaconda because their licensing has changed for corporate users some time ago. Allow me some time, I'll check up on my Ubuntu machine with vanilla Python 3.8. I'm currently merging feature branches into main, will take this up as soon as I'm done.
As far as I understand, this is "P1" kind of a issue which isn't a show stopper and you can work this around for some time, right?
Hi @rarzumanyan ,
I propose conda because it is the easiest way, you can use any env. I also installed pytorch and cmake using env created with python -m venv
and I still have the same problem.
I found a posible issue, vpf is compiled with python 3.7m and I am running 3.7... I will check now if that can be a problem
Hi @rarzumanyan ,
already tested. Same issue with the m
version of python. :(
I do not know what else I can do on my side... if you have any idea please, let me know.
Anyway, thanks for your help :)
Hi @rarzumanyan ,
I run dlprof on the code and i found some differences in the libs loaded when it works with pytorch 1.6 and when it does not work with pytorch 1.8.1.
I extracted this list of diffs using Nsight.
This are the libs that are not used in the case where everything works fine: torch/lib/libc10.so target-linux-x64/libToolsInjectionCuda64.so lib/libcusolver.so.10.3.0.89
This one changes its version: libstdc++.so.6.0.26. — libstdc++.so.6.0.28
Running ldd on vpf and libtorch, I've seen a difference in several libs. The main one maybe is: libcudart.so - vpf uses the version 11 and pytorch the 10.2.
Anyway the error I am getting looks like it is related with libcusolver
`RuntimeError: cusolver error: 7, when calling cusolverDnSgetrs( handle, CUBLAS_OP_N, n, nrhs, dA, lda, ipiv, ret, ldb, info)``
Hi @lferraz
I’m now merging the feature branch which actually removes CUDA runtime api from PyNvCodec. Let us check again once it’s merged.
Im planning to finish the merge tonight or tomorrow in the morning, will update you in this thread.
Hi @lferraz
Please check out latest master
, it has changed merged from nvtx_support
and shall no longer use CUDA runtime API in PyNvCodec
.
HI @rarzumanyan ,
unfortunately the issue is still there. I run ldd
and I still can see the libcudart and libcuda dependencies.
libcudart.so.10.2
libcuda.so.1
Hi @lferraz
I'm now investigating into this issue but there's one blocker: you mention that CUDA 10.2 is reuired:
Note: with this line cudatoolkit 10.2 is installed
Which isn't enough to compile master
ToT because of this function:
https://github.com/NVIDIA/VideoProcessingFramework/blob/906b6dc43e6be99284c24e382cf5fc93196d99c7/PyNvCodec/TC/src/TasksColorCvt.cpp#L115-L116
which requires CUDA 11.0 at least. This addition is very important because it fixes BT.601 and BT.709 YUV -> RGB color conversion.
Without that VPF can't do a proper color conversion to RGB which is crucial for ML applications since majority of NN are trained on RGB datasets and inaccurate conversion to RGB badly hurts prediction accuracy.
Any chance you can upgrade to CUDA 11? If not I'll have to work around the color conversion thing first.
HI @rarzumanyan ,
thanks for the update. I also tested with CUDA 11 and I got the same error :(
Luis
@lferraz
Ok, that means I go ahead with CUDA 11. Thanks for the update!
@rarzumanyan , probably not useful to you but I also added to my compilation script this 2 lines to avoid posible inconsistencies with python.
export PYTHON_LIB=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
export PYTHON=$(which python)
cmake .. -DVIDEO_CODEC_SDK_DIR:PATH="$PATH_TO_SDK" -DGENERATE_PYTHON_BINDINGS:BOOL="1" -DCMAKE_INSTALL_PREFIX:PATH="$INSTALL_PREFIX" -DFFMPEG_DIR:PATH="$PATH_TO_FFMPEG" -DPYTHON_EXECUTABLE:PATH="$PYTHON" -DPYTHON_LIBRARY="$PYTHON_LIB"
@lferraz
I confirm that I can reproduce the issue on following config:
Will update you in this thread as soon as I find something.
@lferraz
Please checkout issue_203
ToT.
I've replaced cuCtxCreate()
with cuDevicePrimaryCtxRetain()
and now following snippet no longer causes any errors:
import torch
import sys
import PyNvCodec as nvc
m = torch.eye(1, device=torch.device('cuda:0'))
h = torch.stack([m, m, m])
nvdec = nvc.PyNvDecoder('/home/roman/Videos/bbb_sunflower_1080p_30fps_normal.mp4', 0)
a = m.inverse()
c = h.inverse()
b = m.inverse()
quit()
I've also tested SampleDecode.py
, it produces valid NV12 output so I assume that VPF functionality isn't broken.
P. S.
I didn't conduct any performance investigation on how does primary CUDA context influences performance and such - the issue_203
branch is kinda hotfix.
@rarzumanyan
looks like it works!!! Tomorrow I will run a more deep validation.
About the performance I have no idea... I do not know what's the difference between cuCtxCreate()
and cuDevicePrimaryCtxRetain()
.
@rarzumanyan , I tested on my pipeline and looks like everything works fine. I compared qualitatively the speed of VPF and it is similar (there are small speed differences but I feel it is because I tested it in two diff machines which are equal except because of the disk, one uses a ssd and the other one a hdd).
Thanks for the update @lferraz
Will merge issue_203
to master
after some additional investigation on my side.
Closing as solved.
Describe the bug
In my server everything works perfect with the last version of VPF (with Video Codec SDK 9 or 10) and pytorch 1.6 + cuda 10.2 (I also tested with several drivers)
However now I am trying to update to pytorch 1.8.1 and there are problems when I use VPF. At some moment I get the next error: RuntimeError: cusolver error: 7, when calling
cusolverDnSgetrs( handle, CUBLAS_OP_N, n, nrhs, dA, lda, ipiv, ret, ldb, info)
this error appear whentorch.inverse(input)
is called.I tested my project without using VPF and in this case everything works fine. The problem appears when I try to use VPF. The fact of adding the most simple code related with VPF (e.g.
nvc.PyNvDecoder(data_source, 0) at the beginning of my script
) generates the previous error.To Reproduce I tried to reproduce this error in a small piece of code but I cannot.
Ideally this should fail but it is not failing :'(
Desktop (please complete the following information):
I tested all the posible combinations cuda + pytorch + video SDK.
Additional context Looks like the problem is in the cublas library. I tried to compile VPF with cuda 10 and use it from cuda 11 and directly fails in the import, but this problem iwth cublas looks like it is quite hidden.
I feel it is quite complex to solve my problem but I'd like to get some feedback from you.