Closed antmatyjajo closed 5 years ago
Thanks a lot, well done, makes me happy :-)
Might avoiding the default template parameter by having two templates in links.cuh
do the trick for us? Something like:
template<typename Pt = float3>
void link_forces(Links& links, const Pt* __restrict__ d_X, Pt* d_dX)
{
link<Pt, linear_force<Pt>><<<(links.get_d_n() + 32 - 1) / 32, 32>>>(
d_X, d_dX, links.d_link, links.get_d_n(), links.strength);
}
template<typename Pt = float3, Link_force<Pt> force>
void link_forces(Links& links, const Pt* __restrict__ d_X, Pt* d_dX)
{
link<Pt, force><<<(links.get_d_n() + 32 - 1) / 32, 32>>>(
d_X, d_dX, links.d_link, links.get_d_n(), links.strength);
}
The default float3
also really irritates me, can we do without? Could you have a look, @antmatyjajo, please?
Also, we should add some tests making sure those things work with a Pt
that is not flota3
and with custom forces. Maybe something like (sorry about my punch-card-style coding ;-)):
template<typename Pt>
__device__ void custom_force(const Pt* __restrict__ d_X, const int a,
const int b, const float strength, Pt* d_dX)
{
atomicAdd(&d_dX[a].w, -1);
atomicAdd(&d_dX[b].w, 1);
}
const char* test_custom_force()
{
Solution<float4, Tile_solver> points{2};
Links links{4};
auto forces = [&links](const float4* __restrict__ d_X, float4* d_dX) {
return link_forces(links, d_X, d_dX);
};
auto custom_forces = [&links](const float4* __restrict__ d_X, float4* d_dX) {
return link_forces<custom_force>(links, d_X, d_dX);
};
// clang-format off
points.h_X[0].x = 1; points.h_X[0].y = 1; points.h_X[0].z = 0; points.h_X[0].w = 1;
points.h_X[1].x = 1; points.h_X[1].y = -1; points.h_X[1].z = 0; points.h_X[1].w = -1;
links.h_link[0].a = 0; links.h_link[0].b = 1;
// clang-format on
points.copy_to_device();
links.copy_to_device();
auto dt = 0.1;
points.take_step<no_pw_int>(dt, forces);
points.take_step<no_pw_int>(dt, custom_forces);
points.copy_to_host();
MU_ASSERT("Not close in x", isclose(points.h_X[0].x - points.h_X[1].x, 0));
MU_ASSERT("Not close in y", isclose(points.h_X[1].y - points.h_X[2].y, 2 - 2 * dt * links.strength));
MU_ASSERT("Not close in z", isclose(points.h_X[2].z - points.h_X[3].z, 0));
MU_ASSERT("Not close in w", isclose(points.h_X[2].w - points.h_X[3].w, 2 - 2 * dt));
return NULL;
}
Could you try to get that to work, @mmarinriera, please?
OK, thanks for your suggestions, all working now :D
Might avoiding the default template parameter by having two templates in links.cuh do the trick for us? Something like: [...] The default float3 also really irritates me, can we do without?
Implemented these in a6d2a8c45ee6d7f568c7fbf94a57f69830153342 Default float3 has been removed, type inference seems to propagate correctly.
Also, we should add some tests making sure those things work with a Pt that is not flota3 and with custom forces.
Implemented your code with some minor changes (indexing in the assertions to account for only n=2 points, changing no_pw_int function to use float4) in fca6bd7adf8dc7375fc8fd189fac5e5a9ce19576
All compile and run on Ubuntu 18.04, cuda 9.2.148, gcc 7.3.0
Also compiled with cuda 8 through 10 ok on the cluster, though unfortunately I haven't had time to run everything in that environment.
When trying to build something that uses links.cuh (examples: intercalation, intercalation_w_gradient, sorting_prot; tests: test_links), nvcc fails at the first ptxas step with the following error:
The error always appears this way, and only happens for the linear_force function.
It seems this is related to the definition of the
link_forces
template, specifically the default template parameterLink_force<Pt> force = linear_force<Pt>
(see here).Specifying the template parameters manually when binding link_forces (rather than relying on automatic generation through the default template parameter) works fine, e.g., in the case of test_links.cu (see here):
I think this is a problem with templates and nvcc, and not with Yalla. Here is a minimal-ish example showing the problem independent of Yalla.
For reference, I tried different CUDA versions and the code builds fine on CUDA 8.0.44, 9.0.176 and 9.1.85. It only fails for CUDA 9.2.88 and greater.