Jittor / JNeRF

JNeRF is a NeRF benchmark based on Jittor. JNeRF re-implemented instant-ngp and achieved same performance with original paper.
Apache License 2.0
640 stars 74 forks source link

运行时错误:Execute fused operator(1/2) failed #25

Open ZhangXiaoXuan2019 opened 2 years ago

ZhangXiaoXuan2019 commented 2 years ago

首先感谢你们的JNeRF工作 !@Gword 我们近日基于你们的JNeRF实现做了进一步的开发,代码运行时出现了 Execute fused operator failed的运行时错误,关键错误提示如下: 截屏2022-07-03 19 54 22 更具体地,我们发现,当且仅当调用JNeRF实现中的render_test函数进行测试图片的渲染时,该函数中的self.sampler.sample一句会报错。其他地方(如NeRF训练时)执行采样操作,无任何问题。 报错信息提示十分地模糊,只是"found something wrong", 导致我们在企图修改时,完全无从下手。 报错信息还提示”pcg32.h“未被找到,我们尝试该头文件将其加入至相关路径下,发现错误未排除,且报错信息又提示”ray_sampler.h“未包括。

请JNeRF团队评估一下这可能是什么地方的问题,并给我们一些建议,谢谢!

Gword commented 2 years ago

可能是编译命令里路径出了一些问题,您能把截图发全一些吗?

ZhangXiaoXuan2019 commented 2 years ago

您好,感谢您及时地回复我们,以下是上面截图中更为完整的报错信息,我们对[Reason]后的信息进行了必要的换行

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py", line 127, in local_model_render img, img_tar = self.render_img(client_s_idx) # in dataset, the image index is the client index

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py", line 167, in render_img pos, dir = self.sampler.sample(img_ids, rays_o, rays_d)

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/density_grid_sampler.py", line 137, in sample coords, rays_index, rays_numsteps, rays_numsteps_counter = self.rays_sampler.execute(

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/ray_sampler.py", line 34, in execute coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code(

RuntimeError: [f 0702 11:29:38.938348 28 executor.cc:665]

Execute fused operator(1/2) failed.

[Input]: float32[1024,3,], float32[1024,3,1,], uint8[1310720,], float32[150,11,], int32[640000,], float32[150,4,3,], float32[1048576,7,], int32[1024,1,], int32[1024,2,], int32[2,],

[Output]: float32[1048576,7,], int32[1024,1,], int32[1024,2,], int32[2,],

 tools/run_fednerf.py:48 <<module>> 
 tools/run_fednerf.py:41 <main> 
 /home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/fedavg_api.py:116 <train> 
 /home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/client.py:28 <train> 
 /home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py:127 <local_model_render> 
 /home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py:167 <render_img> 
 /home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/density_grid_sampler.py:137 <sample> 
 /home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/ray_sampler.py:34 <execute> 

[Reason]: [f 0702 11:29:38.938163 28 cache_compile.cc:295] Check failed: found Something wrong... Could you please report this issue? Include file pcg32.h not found in [ /home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/src,/home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8, /home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8,/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include, /home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc,/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70, /home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc,] Commands: "/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc" "/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70/jit/codeIN_SIZE_6in0_dim_2in0_type_float32in1_dim_3in1_type_float32in2_dim_1in2__hash_a7c7342d82088594_op.cc"
-std=c++14 -Xcompiler -fPIC
-Xcompiler -march=native
-Xcompiler -fdiagnostics-color=always
-lstdc++ -ldl -shared
-I"/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/src" -I/home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8 -I/home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8 -DHAS_CUDA -DIS_CUDA -I"/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include" -I"/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc"
-lcudart -L"/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64" -Xlinker -rpath="/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64"
-I"/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70" -L"/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70" -Xlinker -rpath="/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70" -L"/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default" -Xlinker -rpath="/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default"
-l:"jit_utils_core.cpython-38-x86_64-linux-gnu".so
-l:"jittor_core.cpython-38-x86_64-linux-gnu".so
-x cu --cudart=shared -ccbin="/usr/bin/g++" --use_fast_math -w
-I"/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc"
-arch=compute_70 -code=sm_70 -o "/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70/jit/codeIN_SIZE_6in0_dim_2in0_type_float32in1_dim_3in1_type_float32in2_dim_1in2__hash_a7c7342d82088594_op.so"

Gword commented 2 years ago

看编译命令确实没有pcg32.h的路径,该编译选项是在 coords_out.compile_options = proj_options设置的,您可以在这句话后面打印一下coords_out.compile_options看是否有pcg32的路径。

ZhangXiaoXuan2019 commented 2 years ago

您好!感谢您的建议。 我们这里所提出的这个运行时错误,在且仅在调用render_test函数时才会出现,在模型训练中调用sampler采样时,并无运行时错误。 报错的 coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code( 一句,在coords_out.compile_options = proj_options一句之前,所以报错时coords_out.compile_options = proj_options一句并未执行,coords_out.compile_options 自然为空dict。 我们又尝试在报错语句前后都执行coords_out.compile_options = proj_options,一句,错误未排除。 您看您是否还有其他建议?:)

Gword commented 2 years ago

jittor是lazy执行的所以一般执行jt.code后并不会马上编译,可能您修改了哪里导致它没有lazy执行了,您能把ray_sampler.py的代码贴给我看一下吗? 以及您可以尝试在coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code之前添加rays_o.compile_options = proj_options通过输入设置编译选项。

ZhangXiaoXuan2019 commented 2 years ago

您好,感谢您的耐心解答。在”jt.code“一句之前添加rays_o.compile_options = proj_options不解决问题。 ray_sampler.py文件如下。事实上我们目前没有对ray_sampler.py文件,以及python/jnerf/下的任何文件做出修改。且采样器在训练时执行采样操作不报错,当且仅当在render_test函数中报错。 import os import jittor as jt from jittor import Function, exp, log import numpy as np import sys from jnerf.ops.code_ops.global_vars import global_headers, proj_options jt.flags.use_cuda = 1

class RaySampler(Function): def init(self, density_grad_header, near_distance, cone_angle_constant, aabb_range=(-1.5, 2.5), n_rays_per_batch=4096, n_rays_step=1024): self.density_grad_header = density_grad_header self.aabb_range = aabb_range self.near_distance = near_distance self.n_rays_per_batch = n_rays_per_batch self.num_elements = n_rays_per_batch*n_rays_step self.cone_angle_constant = cone_angle_constant self.path = os.path.join(os.path.dirname(file), '..', 'op_include') self.ray_numstep_counter = jt.zeros([2], 'int32')

def execute(self, rays_o, rays_d, density_grid_bitfield, metadata, imgs_id, xforms):
    # input
    # rays_o n_rays_per_batch x 3
    # rays_d n_rays_per_batch x 3
    # bitfield 128 x 128 x 128 x 5 / 8
    # return
    # coords_out=[self.num_elements,7]
    # rays index : store rays is used ( not for -1)
    # rays_numsteps [0:step,1:base]
    jt.init.zero_(self.ray_numstep_counter)
    coords_out = jt.empty((self.num_elements, 7), 'float32')
    self.n_rays_per_batch=rays_o.shape[0]
    rays_index = jt.empty((self.n_rays_per_batch, 1), 'int32')
    rays_numsteps = jt.empty((self.n_rays_per_batch, 2), 'int32')
    coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code(
        inputs=[rays_o, rays_d, density_grid_bitfield, metadata, imgs_id, xforms], outputs=[coords_out,rays_index,rays_numsteps,self.ray_numstep_counter], 
        cuda_header=global_headers+self.density_grad_header+'#include "ray_sampler.h"',  cuda_src=f"""

    @alias(rays_o, in0)
    @alias(rays_d, in1)
    @alias(density_grid_bitfield,in2)
    @alias(metadata,in3)
    @alias(imgs_index,in4)
    @alias(xforms_input,in5)
    @alias(ray_numstep_counter,out3)
    @alias(coords_out,out0)
    @alias(rays_index,out1)
    @alias(rays_numsteps,out2)

    cudaStream_t stream=0;
    cudaMemsetAsync(coords_out_p, 0, coords_out->size);

    const unsigned int num_elements=coords_out_shape0;
    const uint32_t n_rays=rays_o_shape0;
    BoundingBox m_aabb = BoundingBox(Eigen::Vector3f::Constant({self.aabb_range[0]}), Eigen::Vector3f::Constant({self.aabb_range[1]}));
    float near_distance = {self.near_distance};
    float cone_angle_constant={self.cone_angle_constant};  
    linear_kernel(rays_sampler,0,stream,
        n_rays, m_aabb, num_elements,(Vector3f*)rays_o_p,(Vector3f*)rays_d_p, (uint8_t*)density_grid_bitfield_p,cone_angle_constant,(TrainingImageMetadata *)metadata_p,(uint32_t*)imgs_index_p,
        (uint32_t*)ray_numstep_counter_p,((uint32_t*)ray_numstep_counter_p)+1,(uint32_t*)rays_index_p,(uint32_t*)rays_numsteps_p,PitchedPtr<NerfCoordinate>((NerfCoordinate*)coords_out_p, 1, 0, 0),(Eigen::Matrix<float, 3, 4>*) xforms_input_p,near_distance,rng);   

    rng.advance();

""")

    coords_out.compile_options = proj_options
    # print(coords_out.compile_options)
    coords_out.sync()
    coords_out = coords_out.detach()
    rays_index = rays_index.detach()
    rays_numsteps = rays_numsteps.detach()
    self.ray_numstep_counter = self.ray_numstep_counter.detach()
    samples=self.ray_numstep_counter[1].item()
    coords_out=coords_out[:samples]
    return coords_out, rays_index, rays_numsteps, self.ray_numstep_counter

def grad(self, grad_x):
    ##should not reach here
    assert(grad_x == None)
    assert(False)
    return None

此外,您提到的,导致jt.code一句没有lazy执行的原因,是可以通过某种方法去追溯的吗? 如果有,我们希望了解该种方法,因为日后我们可能要再基于JNeRF的实现做一些新的改动。每次出现问题都来麻烦JNeRF团队提供建议,也是不太现实的。