Open icedoom888 opened 4 years ago
I'll take a look this weekend. Thanks!
Hmm it works fine on my machine and I couldn't reproduce:
Number of Cuda visible devices: 4
Device 0 properties: _CudaDeviceProperties(name='TITAN Xp', major=6, minor=1, total_memory=12195MB, multi_processor_count=30)
Device 1 properties: _CudaDeviceProperties(name='TITAN Xp', major=6, minor=1, total_memory=12196MB, multi_processor_count=30)
Scene construction, time: 4.04201 s
Forward pass, time: 1.61628 s
cuda:1
Could it be due to your two GPUs having different compute capabilities? Can you somehow test it?
Hello @BachiLi, thank you for testing this.
I ran some additional tests by running the following code:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import pyredner as pyredner
import torch
import matplotlib.pyplot as plt
class Neural_Renderer(torch.nn.Module):
def __init__(self, device):
super(Neural_Renderer, self).__init__()
pyredner.set_device(device)
self.mesh_path = "/mnt/soarin/data/church_tower/mesh/church_tower.obj"
self.resolution = [3000, 4000]
def forward(self):
objects = pyredner.load_obj(self.mesh_path, return_objects=True)
object = objects[0]
camera = pyredner.automatic_camera_placement([object], resolution = self.resolution)
camera.position += 2* (camera.look_at - camera.position)
scene = pyredner.Scene(camera = camera, objects = [object])
img = pyredner.render_albedo(scene)
print(img.device)
return img
print('Number of Cuda visible devices: ', torch.cuda.device_count())
device_0 = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Device 0 properties: ', torch.cuda.get_device_properties(device_0))
renderer = Neural_Renderer(device_0)
im = renderer().cpu()
plt.imshow(im)
plt.show()
If CUDA_VISIBLE_DEVICES=0, the code will run on my GTX 1080 Ti which has a capacity of 11178 MiB: producing the following trace:
Number of Cuda visible devices: 1
Device 0 properties: _CudaDeviceProperties(name='GeForce GTX 1080 Ti', major=6, minor=1, total_memory=11178MB, multi_processor_count=28)
Scene construction, time: 0.03863 s
Forward pass, time: 1.51581 s
cuda:0
If CUDA_VISIBLE_DEVICES=1, the code will run on my GTX TITAN X which has a capacity of 12209 MiB: producing the following trace:
Number of Cuda visible devices: 1
Device 0 properties: _CudaDeviceProperties(name='GeForce GTX TITAN X', major=5, minor=2, total_memory=12209MB, multi_processor_count=24)
Scene construction, time: 0.03840 s
CUDA Runtime Error: out of memory at /tmp/pip-req-build-pa0ny0bz/buffer.h:55
Nvidia-smi shows:
Mon Jan 27 10:41:25 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:01:00.0 On | N/A |
| 28% 60C P8 20W / 250W | 10710MiB / 12209MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 27% 32C P8 7W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
- Running on the TITAN X, reducing the image resolution does NOT produce any error.
It is odd that the TITAN X runs out of memory given it has more computation capability and has no initial active processes running on it.
Do you have any suggestions?
This has to do with how PyTorch manages the memory as well (e.g., https://discuss.pytorch.org/t/unable-to-allocate-cuda-memory-when-there-is-enough-of-cached-memory/33296). How much memory does the rendering take in the 1080 Ti? (by the way compute capability means something quite different https://stackoverflow.com/questions/11973174/what-does-compute-capability-mean-w-r-t-cuda/11974822)
I might have an issue related to this one. I get an out of memory error for my (different) code:
CUDA Runtime Error: out of memory at /tmp/pip-req-build-dnhy8cqu/src/buffer.h:55
when running with a GeForce GTX TITAN X (Compute capability: 5.2) but not when running with a Titan X (Compute capability: 6.1). Unfortunately I can't provide a reproducible code snippet.
Might it be possible that redner-gpu behaves differently for compute capability>=6?
Possible. Unified memory works differently for pre-Pascal GPU and the others (https://devblogs.nvidia.com/unified-memory-cuda-beginners/). This is what I was suspecting. I don't have a good solution to this yet.
The following code fails when setting pyredner device:
With the following trace:
On the other hand if I use the CUDA_VISIBLE_DEVICES = 1 the same exact code works:
With the following trace:
Please also note that calling the pytorch standard .to(device) procedure has no effect whatsoever on redner internal state..