Exporting large trace in 3D breaks taichi

deehrlic commented 1 year ago

When running experiments/PolyPhy_3D_discrete_mask.ipynb, if you load in a mesh that results in trace dimensions of (512,512,512), when you press Export Fit in the UI, the to_numpy() function from taichi throws a big error as seen below:

RuntimeError Traceback (most recent call last) Cell In[17], line 204 202 window.show() 203 if do_export: --> 204 current_stamp, deposit, trace = store_fit() 205 if do_quit: 206 break

Cell In[17], line 38, in store_fit() 36 current_stamp = stamp() 37 print(deposit_field.shape) ---> 38 deposit = deposit_field.tonumpy(dtype = np.float32) 39 np.save(ROOT + 'data/fits/deposit' + current_stamp + '.npy', deposit) 40 trace = trace_field.to_numpy()

File ~\Anaconda3\lib\site-packages\taichi\lang\util.py:311, in python_scope..wrapped(*args, kwargs) 307 @functools.wraps(func) 308 def wrapped(*args, *kwargs): 309 assert in_python_scope(), \ 310 f'{func.name} cannot be called in Taichi-scope' --> 311 return func(args, kwargs)

File ~\Anaconda3\lib\site-packages\taichi\lang\matrix.py:1321, in MatrixField.to_numpy(self, keep_dims, dtype) 1319 arr = np.zeros(self.shape + shape_ext, dtype=dtype) 1320 from taichi._kernels import matrix_to_ext_arr # pylint: disable=C0415 -> 1321 matrix_to_ext_arr(self, arr, as_vector) 1322 runtime_ops.sync() 1323 return arr

File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:1023, in _kernel_impl..wrapped(*args, kwargs) 1020 @functools.wraps(_func) 1021 def wrapped(*args, *kwargs): 1022 try: -> 1023 return primal(args, kwargs) 1024 except (TaichiCompilationError, TaichiRuntimeError) as e: 1025 raise type(e)('\n' + str(e)) from None

File ~\Anaconda3\lib\site-packages\taichi\lang\shell.py:27, in _shell_pop_print..new_call(*args, kwargs) 25 @functools.wraps(old_call) 26 def new_call(*args, *kwargs): ---> 27 ret = old_call(args, kwargs) 28 # print's in kernel won't take effect until ti.sync(), discussion: 29 # https://github.com/taichi-dev/taichi/pull/1303#discussion_r444897102 30 print(_ti_core.pop_python_print_buffer(), end='')

File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:950, in Kernel.call(self, *args, *kwargs) 948 impl.current_cfg().opt_level = 1 949 key = self.ensure_compiled(args) --> 950 return self.runtime.compiled_functionskey

File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:853, in Kernel.get_function_body..func__(*args) 851 except Exception as e: 852 e = handle_exception_from_cpp(e) --> 853 raise e from None 855 ret = None 856 ret_dt = self.return_type

File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:850, in Kernel.get_function_body..func__(*args) 845 raise TaichiRuntimeError( 846 f"The number of elements in kernel arguments is too big! Do not exceed 64 on {_ti_core.arch_name(impl.current_cfg().arch)} backend." 847 ) 849 try: --> 850 t_kernel(launch_ctx) 851 except Exception as e: 852 e = handle_exception_from_cpp(e)

RuntimeError: [taichi/rhi/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *>::operator ()@90] CUDA Error CUDA_ERROR_ASSERT: device-side assert triggered while calling stream_synchronize (cuStreamSynchronize)

To recreate: -Load the experiments/PolyPhy_3D_discrete_mask.ipynb and use the mesh linked here: https://www.thingiverse.com/thing:115644/files -Press Export Fit on the simulation UI

Tested on: NVIDIA RTX 2080Ti Cuda backend

The taichi kernel appears to have been given too much data to process in to_numpy() - using a smaller maximum trace resolution such as 256 instead of 512 works just fine.

OskarElek commented 10 months ago

Needs re-testing. Taichi has since made significant progress and, depending on the availability of GPU memory, it should now be able to run much larger resolutions for trace & deposit fields.

deehrlic commented 10 months ago

From my testing the new cap is closer to (800,800,800) but this may vary gpu to gpu - it seems like Taichi struggles to allocate large amounts of GPU memory but not sure if this is a Taichi thing or a GPU thing.

OskarElek commented 10 months ago

Gotcha. It's likely a Taichi thing because using a lower level API (e.g. Compute Shaders in DirectX on Windows) I'm able to fully utilize my GPU's memory pool, but not with Taichi.

Let's keep the issue open to keep track of this. Maybe other folks can contribute their experience so we get a more holistic view of the issue.

PolyPhyHub / PolyPhy

Exporting large trace in 3D breaks taichi #45

RuntimeError: [taichi/rhi/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *>::operator ()@90] CUDA Error CUDA_ERROR_ASSERT: device-side assert triggered while calling stream_synchronize (cuStreamSynchronize)

Tested on: NVIDIA RTX 2080Ti Cuda backend