VoVAllen / tf-dlpack

DLPack for Tensorflow
Apache License 2.0
36 stars 4 forks source link

[BUG] Segmentation Fault when using tfdlpack.to_dlpack on tf.tensor #12

Open awthomp opened 4 years ago

awthomp commented 4 years ago

I've been experimenting with using tfdlpack to connect libraries using __cuda_array_interface__ to TensorFlow with tfdlpack and reach a segmentation fault when invoking to_dlpack with a TF tensor. See below for replication:

import cupy as cp
import tfdlpack

# CuPy - GPU Array (like NumPy!)
gpu_arr = cp.random.rand(10_000, 10_000)

# Use CuPy's built in `toDlpack` function to move to a DLPack capsule
dlpack_arr = gpu_arr.toDlpack()

# Use `tfdlpack` to migrate to TensorFlow
tf_tensor = tfdlpack.from_dlpack(dlpack_arr)

# Confirm TF tensor is on GPU
print(tf_tensor.device)

# Use `tfdlpack` to migrate back to CuPy; this yields a segmentation fault
dlpack_capsule = tfdlpack.to_dlpack(tf_tensor)

I'm using 1 GP100 isolated with the CUDA_VISIBLE_DEVICES environment variable.

jermainewang commented 4 years ago

Confirmed this is a bug. I replaced cupy with torch and it also crashes.

import torch
from torch.utils import dlpack as th_dlpack
import tfdlpack

gpu_arr = torch.rand(10_000, 10_000).cuda()
print(gpu_arr)

dlpack_arr = th_dlpack.to_dlpack(gpu_arr)

# Use `tfdlpack` to migrate to TensorFlow
tf_tensor = tfdlpack.from_dlpack(dlpack_arr)

# Confirm TF tensor is on GPU
print(tf_tensor.device)

# Use `tfdlpack` to migrate back to CuPy; this yields a segmentation fault
dlpack_capsule = tfdlpack.to_dlpack(tf_tensor)
jermainewang commented 4 years ago

What's your tensorflow version? I found the code works with tensorflow v2.1.0 but not v2.0.0.

VoVAllen commented 4 years ago

It works well on my machine. I'm using tensorflow 2.1.0

awthomp commented 4 years ago

What's your tensorflow version? I found the code works with tensorflow v2.1.0 but not v2.0.0.

Interesting. I was on TF 2.1.0 when submitting the bug report. I've included an Anaconda environment file below to ensure we're on the same page for SW dependencies:

name: tfdlpack
channels:
  - conda-forge
  - nvidia
  - pytorch
  - defaults
  - numba
dependencies:
  - python=3.7
  - numpy
  - cudatoolkit>=9.2,<10.2
  - numba
  - cupy>=6.2.0
  - pytorch
  - pip
  - pip:
      - tfdlpack-gpu

Just save this into a file named tfdlpack_conda.yml. Then run:

conda env create -f tfdlpack_conda.yml conda activate tfdlpack

My system contains 2 GP100s (Pascal P100) and 1 P2000 to drive graphics. I typically isolate GPU0 (P100) with export CUDA_VISIBLE_DEVICES=0.

awthomp commented 4 years ago

I'm also receiving the segfault with an NVIDIA T4. Here's a Google Colab notebook that you can run through. Perhaps pip install tfdlpack-gpu isn't pulling in all the expected/necessary dependencies?

https://colab.research.google.com/drive/18Z8bOCJ2Mr-jOD-vIbr6KAO1-KPUy_UM

VoVAllen commented 4 years ago

Thanks for your example. Actually I'm thinking of reorganize the whole project based on new tensorflow custom-op repo https://github.com/tensorflow/custom-op. As this is the official guide on how to distribute custom op. However I'm skeptical on whether I should make the project based on Bazel instead of CMake. I may need more time on thihs.

awthomp commented 4 years ago

Thanks, @VoVAllen and thanks for your hard and great work at enabling DLPack support with TensorFlow. Don't hesitate to let us know what you need help with.

VoVAllen commented 4 years ago

@awthomp I've updated the binary release and it now works in colab. Could you try it in your environment again?

However there's still bug in this release. It would happen when you create a capsule from tensorflow but not consuming it in another framework. I'm still investigating the solution.

awthomp commented 4 years ago

@VoVAllen. Wahoo! Works for me in both Colab on a T4 and on my local machine with a P100. Thanks for the quick fix!