Open deehrlic opened 1 year ago
Needs re-testing. Taichi has since made significant progress and, depending on the availability of GPU memory, it should now be able to run much larger resolutions for trace & deposit fields.
From my testing the new cap is closer to (800,800,800) but this may vary gpu to gpu - it seems like Taichi struggles to allocate large amounts of GPU memory but not sure if this is a Taichi thing or a GPU thing.
Gotcha. It's likely a Taichi thing because using a lower level API (e.g. Compute Shaders in DirectX on Windows) I'm able to fully utilize my GPU's memory pool, but not with Taichi.
Let's keep the issue open to keep track of this. Maybe other folks can contribute their experience so we get a more holistic view of the issue.
When running experiments/PolyPhy_3D_discrete_mask.ipynb, if you load in a mesh that results in trace dimensions of (512,512,512), when you press Export Fit in the UI, the to_numpy() function from taichi throws a big error as seen below:
RuntimeError Traceback (most recent call last) Cell In[17], line 204 202 window.show() 203 if do_export: --> 204 current_stamp, deposit, trace = store_fit() 205 if do_quit: 206 break
Cell In[17], line 38, in store_fit() 36 current_stamp = stamp() 37 print(deposit_field.shape) ---> 38 deposit = deposit_field.tonumpy(dtype = np.float32) 39 np.save(ROOT + 'data/fits/deposit' + current_stamp + '.npy', deposit) 40 trace = trace_field.to_numpy()
File ~\Anaconda3\lib\site-packages\taichi\lang\util.py:311, in python_scope..wrapped(*args, kwargs)
307 @functools.wraps(func)
308 def wrapped(*args, *kwargs):
309 assert in_python_scope(), \
310 f'{func.name} cannot be called in Taichi-scope'
--> 311 return func(args, kwargs)
File ~\Anaconda3\lib\site-packages\taichi\lang\matrix.py:1321, in MatrixField.to_numpy(self, keep_dims, dtype) 1319 arr = np.zeros(self.shape + shape_ext, dtype=dtype) 1320 from taichi._kernels import matrix_to_ext_arr # pylint: disable=C0415 -> 1321 matrix_to_ext_arr(self, arr, as_vector) 1322 runtime_ops.sync() 1323 return arr
File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:1023, in _kernel_impl..wrapped(*args, kwargs)
1020 @functools.wraps(_func)
1021 def wrapped(*args, *kwargs):
1022 try:
-> 1023 return primal(args, kwargs)
1024 except (TaichiCompilationError, TaichiRuntimeError) as e:
1025 raise type(e)('\n' + str(e)) from None
File ~\Anaconda3\lib\site-packages\taichi\lang\shell.py:27, in _shell_pop_print..new_call(*args, kwargs)
25 @functools.wraps(old_call)
26 def new_call(*args, *kwargs):
---> 27 ret = old_call(args, kwargs)
28 # print's in kernel won't take effect until ti.sync(), discussion:
29 # https://github.com/taichi-dev/taichi/pull/1303#discussion_r444897102
30 print(_ti_core.pop_python_print_buffer(), end='')
File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:950, in Kernel.call(self, *args, *kwargs) 948 impl.current_cfg().opt_level = 1 949 key = self.ensure_compiled(args) --> 950 return self.runtime.compiled_functionskey
File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:853, in Kernel.get_function_body..func__(*args)
851 except Exception as e:
852 e = handle_exception_from_cpp(e)
--> 853 raise e from None
855 ret = None
856 ret_dt = self.return_type
File ~\Anaconda3\lib\site-packages\taichi\lang\kernel_impl.py:850, in Kernel.get_function_body..func__(*args)
845 raise TaichiRuntimeError(
846 f"The number of elements in kernel arguments is too big! Do not exceed 64 on {_ti_core.arch_name(impl.current_cfg().arch)} backend."
847 )
849 try:
--> 850 t_kernel(launch_ctx)
851 except Exception as e:
852 e = handle_exception_from_cpp(e)
RuntimeError: [taichi/rhi/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *>::operator ()@90] CUDA Error CUDA_ERROR_ASSERT: device-side assert triggered while calling stream_synchronize (cuStreamSynchronize)
To recreate: -Load the experiments/PolyPhy_3D_discrete_mask.ipynb and use the mesh linked here: https://www.thingiverse.com/thing:115644/files -Press Export Fit on the simulation UI
Tested on: NVIDIA RTX 2080Ti Cuda backend
The taichi kernel appears to have been given too much data to process in to_numpy() - using a smaller maximum trace resolution such as 256 instead of 512 works just fine.