Closed njzjz closed 3 years ago
How large is the PBC system?
There’s a collapsed data point in the input coords which contains a very large number:
s1.data["coords"][0][274] array([1.88329067e+01, 1.10405912e+17, 1.80823460e+01])
The v2.0 takes a while function to normalize the coord by cutting down 1 box-scale in one loop, so it stucked (cpu and gpu). (In v1.x, it only cut down 1 box-scale once, which is not expected but passed.)
I can figure out the simple solution to replace the while function from:
while(ri[dd] >= 1.) ri[dd] -= 1.; while(ri[dd] < 0.) ri[dd] += 1.;
to:
ri[dd]=ri[dd]-(long long int)ri[dd]; if (ri[dd] < 0.) ri[dd] += 1.;
Or we add an assert to exit when it encounters a very large number.
I just realized that I missed a dot when I copy the numbers!
(So 1.10405912e+17
should be 1.10405912e+01
)
I can figure out the simple solution to replace the while function from:
while(ri[dd] >= 1.) ri[dd] -= 1.; while(ri[dd] < 0.) ri[dd] += 1.;
to:
ri[dd]=ri[dd]-(long long int)ri[dd]; if (ri[dd] < 0.) ri[dd] += 1.;
Although my input is incorrect, but for the current program, why not use something like
ri[dd] = fmod(ri[dd], 1.);
if (ri[dd] < 0.) ri[dd] += 1.;
See https://www.cplusplus.com/reference/cmath/fmod/ https://developer.download.nvidia.com/cg/fmod.html
Although my input is incorrect, but for the current program, why not use something like
ri[dd] = fmod(ri[dd], 1.); if (ri[dd] < 0.) ri[dd] += 1.;
See https://www.cplusplus.com/reference/cmath/fmod/ https://developer.download.nvidia.com/cg/fmod.html
We used to assume the coords in box-scale should be very small(around 1-2 or less), so the while style should be enough. But this issue reminds us to set something like you said fmod incase of unexpected numbers.
Summary
Using PBC to predict my system is very slow in both v2.0.0.b0 and v2.0.0.b1, for both Python and C++. Non-PBC works well. The model converted from v1.3.3 or frozen in v2.0.0.b1 have the same behavior below.
Deepmd-kit version, installation way, input file, running commands, error log, etc.
v2.0.0.b1 conda GPU cuda10.1
Steps to Reproduce
dpdata is https://github.com/deepmodeling/dpdata/pull/162
Model input is as the same as #658.
Further Information, Files, and Links