Closed denghuilu closed 3 years ago
This bug can still be seen while using C++ interface in devel branch. After checking the inner structure of graph, it can be seen that many of layers need float64 input/output, even though DP is compiled under float precision. I'm not sure if it's the reason of that bug, or it's some special design. Hope such information can help.
The graph is trainned from the case examples/water/se_e2_a without any modification. DP is compiled without HIGH_PREC flag. The freezed graph file and script to print inner operators are attached.
This bug can still be seen while using C++ interface in devel branch. After checking the inner structure of graph, it can be seen that many of layers need float64 input/output, even though DP is compiled under float precision. I'm not sure if it's the reason of that bug, or it's some special design. Hope such information can help.
The graph is trainned from the case examples/water/se_e2_a without any modification. DP is compiled without HIGH_PREC flag. The freezed graph file and script to print inner operators are attached.
Not really.
The option FLOAT_PREC compiling flag only controls the floating point precision in the interfaces of deepmd-kit.
If one wants to set the precision in the models, he/she can use the "precision"
flag in the descritptors and fitting nets.
@amcadmus The reason for this error is that the _prepare_coord_nlist_gpu
function in $deepmd_source_dir/source/op/prod_env_mat_multi_device.cc
has a bug in its support for single precision.
Here's the detection process:
std::cout << "I'm in prod_env_mat_a 5!" << std::endl;
// prepare coord and nlist
_prepare_coord_nlist_gpu<FPTYPE>(
context, &tensor_list[0], &coord, coord_cpy, &type, type_cpy, idx_mapping,
gpu_inlist, ilist, numneigh, firstneigh, jlist, nbor_list_dev,
frame_nall, mem_cpy, mem_nnei, max_nbor_size,
box, mesh_tensor.flat<int>().data(), mesh_tensor_size, nloc, nei_mode, rcut_r, max_cpy_trial, max_nnei_trial);
std::cout << "I'm in prod_env_mat_a 6!" << std::endl;
The result output is:
2021-05-11 12:11:57.966802: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 1700000000 Hz
I'm in prod_env_mat_a 1!
I'm in prod_env_mat_a 2!
I'm in prod_env_mat_a 3!
I'm in prod_env_mat_a 4!
I'm in prod_env_mat_a 5!
2021-05-11 12:11:58.288648: F tensorflow/core/framework/tensor.cc:665] Check failed: dtype() == expected_dtype (2 vs. 1) float expected, got double
Aborted (core dumped)
So the program failed in function _prepare_coord_nlist_gpu
.
Summary
Deepmd-kit version, installation way, input file, running commands, error log, etc. version: latest version of devel branch; installation way: python interface with single precision, set cmake_args:
Installation works fine.
Steps to Reproduce
error occurs:
Further Information, Files, and Links