Closed dbaumgaertner closed 5 years ago
100th issue :tada:
I did some tests, where i simply use the same operations we do in the gradient calculation - without the calculation of a real stiffness matrix. Windows indeed is a little slower.
#############################################
#############################################
#############################################
# UBUNTU
#############################################
#############################################
#############################################
# test_RHS_operations | n 1000000 | size 20
# 1.37613e-263
# Time 0.7232062816619873
# test_RHS_operations_expensive | n 1000000 | size 20
# 2.57576e+07
# Time 0.36026930809020996
# test_RHS_operations | n 1000000 | size 20
# 2.57576e+07
# Time 0.08381533622741699
# test_RHS_operations_expensive | n 1000000 | size 20
# 2.57576e+07
# Time 0.37224411964416504
# test_RHS_inside_loop | n 1000000 | size 20
# 1e+06
# Time 0.015364885330200195
# test_RHS_outside_loop | n 1000000 | size 20
# 1e+06
# Time 0.005693912506103516
#############################################
#############################################
#############################################
# test_RHS_operations | n 1000000 | size 100
# -nan
# Time 0.4674644470214844
# test_RHS_operations_expensive | n 1000000 | size 100
# -nan
# Time 8.594015836715698
# test_RHS_operations | n 1000000 | size 100
# -nan
# Time 0.39067959785461426
# test_RHS_operations_expensive | n 1000000 | size 100
# -nan
# Time 8.604081869125366
# test_RHS_inside_loop | n 1000000 | size 100
# 1e+06
# Time 0.016034603118896484
# test_RHS_outside_loop | n 1000000 | size 100
# 1e+06
# Time 0.01217198371887207
#############################################
#############################################
#############################################
# WINDOWS
#############################################
#############################################
#############################################
# test_RHS_operations | n 1000000 | size 20
# -2.31527e+263
# Time 0.19014501571655273
# test_RHS_operations_expensive | n 1000000 | size 20
# -2.37987e+263
# Time 0.6268339157104492
# test_RHS_operations | n 1000000 | size 20
# 5.5777e+261
# Time 0.1262662410736084
# test_RHS_operations_expensive | n 1000000 | size 20
# 0
# Time 0.5714337825775146
# test_RHS_inside_loop | n 1000000 | size 20
# 1e+06
# Time 0.0663764476776123
# test_RHS_outside_loop | n 1000000 | size 20
# 1e+06
# Time 0.012975215911865234
#############################################
#############################################
#############################################
# test_RHS_operations | n 1000000 | size 100
# -nan
# Time 0.6218540668487549
# test_RHS_operations_expensive | n 1000000 | size 100
# 9.79798e+07
# Time 11.166205167770386
# test_RHS_operations | n 1000000 | size 100
# 4.89899e+07
# Time 0.451171875
# test_RHS_operations_expensive | n 1000000 | size 100
# 9.79798e+07
# Time 11.172195196151733
# test_RHS_inside_loop | n 1000000 | size 100
# 1e+06
# Time 0.06638073921203613
# test_RHS_outside_loop | n 1000000 | size 100
# 1e+06
# Time 0.02345442771911621
Thanks for that detailed study! I figured out, that the recent performance drop has nothing to do with windows or linux. It comes from the new implementation of the gradient calculation using ElementFiniteDifferenceUtility::CalculateRightHandSideDerivative.
Adjusting StrainEnergyResponseFunctionUtility::CalculateStateDerivativePartByFiniteDifferencing to the original implementation we had leads to a drastic increase in performance. See the following code and note the commented / exchanged lines in the first for loop over the nodes.
void CalculateStateDerivativePartByFiniteDifferencing()
{
KRATOS_TRY;
// Working variables
ProcessInfo &CurrentProcessInfo = mrModelPart.GetProcessInfo();
// Computation of: \frac{1}{2} u^T \cdot ( - \frac{\partial K}{\partial x} )
for (auto& elem_i : mrModelPart.Elements())
{
Vector u;
Vector lambda;
Vector RHS;
// Get state solution
elem_i.GetValuesVector(u,0);
// Get adjoint variables (Corresponds to 1/2*u)
lambda = 0.5*u;
// Semi-analytic computation of partial derivative of state equation w.r.t. node coordinates
elem_i.CalculateRightHandSide(RHS, CurrentProcessInfo);
for (auto& node_i : elem_i.GetGeometry())
{
// array_3d gradient_contribution(3, 0.0);
// Vector derived_RHS = Vector(0);
// // x-direction
// ElementFiniteDifferenceUtility::CalculateRightHandSideDerivative(elem_i, SHAPE_X, node_i, mDelta, derived_RHS, CurrentProcessInfo);
// gradient_contribution[0] = inner_prod(lambda, derived_RHS);
// // y-direction
// ElementFiniteDifferenceUtility::CalculateRightHandSideDerivative(elem_i, SHAPE_Y, node_i, mDelta, derived_RHS, CurrentProcessInfo);
// gradient_contribution[1] = inner_prod(lambda, derived_RHS);
// // z-direction
// ElementFiniteDifferenceUtility::CalculateRightHandSideDerivative(elem_i, SHAPE_Z, node_i, mDelta, derived_RHS, CurrentProcessInfo);
// gradient_contribution[2] = inner_prod(lambda, derived_RHS);
// // Assemble sensitivity to node
// noalias(node_i.FastGetSolutionStepValue(SHAPE_SENSITIVITY)) += gradient_contribution;
array_3d gradient_contribution(3, 0.0);
Vector perturbed_RHS = Vector(0);
// Pertubation, gradient analysis and recovery of x
node_i.X0() += mDelta;
elem_i.CalculateRightHandSide(perturbed_RHS, CurrentProcessInfo);
gradient_contribution[0] = inner_prod(lambda, (perturbed_RHS - RHS) / mDelta);
node_i.X0() -= mDelta;
// Reset pertubed vector
perturbed_RHS = Vector(0);
// Pertubation, gradient analysis and recovery of y
node_i.Y0() += mDelta;
elem_i.CalculateRightHandSide(perturbed_RHS, CurrentProcessInfo);
gradient_contribution[1] = inner_prod(lambda, (perturbed_RHS - RHS) / mDelta);
node_i.Y0() -= mDelta;
// Reset pertubed vector
perturbed_RHS = Vector(0);
// Pertubation, gradient analysis and recovery of z
node_i.Z0() += mDelta;
elem_i.CalculateRightHandSide(perturbed_RHS, CurrentProcessInfo);
gradient_contribution[2] = inner_prod(lambda, (perturbed_RHS - RHS) / mDelta);
node_i.Z0() -= mDelta;
// Assemble sensitivity to node
noalias(node_i.FastGetSolutionStepValue(SHAPE_SENSITIVITY)) += gradient_contribution;
}
}
// Computation of \frac{1}{2} u^T \cdot ( \frac{\partial f_{ext}}{\partial x} )
for (auto& cond_i : mrModelPart.Conditions())
{
....
}
KRATOS_CATCH("");
}
@armingeiser As you implmeneted this new approach, can you please have a look into this. For sure we cannot sacrifice the performance here for some better code structure. Nevertheless, I hope there is a way to fix this, while maintaining your new structuring.
Another thing I noticed. Whenever calling CalulcateRHS or functions like this the base_shell_element class is internally computing both the stiffness matrix and the RHS. Corresponding flags are internally set to true, regardeless of the specific function which is called.
Changing that as follows leads again to 25% performance gain in the gradient calculation:
void BaseShellElement::CalculateLeftHandSide(MatrixType& rLeftHandSideMatrix,
ProcessInfo& rCurrentProcessInfo)
{
// Calculation flags
const bool calculate_stiffness_matrix_flag = true;
const bool calculate_residual_vector_flag = false; // TODO check is this can be false => see solids
Vector dummy;
CalculateAll(rLeftHandSideMatrix, dummy, rCurrentProcessInfo,
calculate_stiffness_matrix_flag, calculate_residual_vector_flag);
}
void BaseShellElement::CalculateRightHandSide(VectorType& rRightHandSideVector,
ProcessInfo& rCurrentProcessInfo)
{
// Calculation flags
const bool calculate_stiffness_matrix_flag = false; // TODO check is this can be false => see solids
const bool calculate_residual_vector_flag = true;
Matrix dummy;
CalculateAll(dummy, rRightHandSideVector, rCurrentProcessInfo,
calculate_stiffness_matrix_flag, calculate_residual_vector_flag);
}
I don't know if this has other implications, but I guess this inconsistent behavior should be really changed because in the current master, both functions above do the same as "CalculateLocalSystem".
@dbaumgaertner I see the problem with the shells, there is even a TODO in Master :) I just haven't had the time yet to properly test it
Could you try for your case if the results are different or if there are any other differences?
@dbaumgaertner wow thats interesting, thanks for figuring it out.
Do you have numbers for the first issue with the ElementFiniteDifferenceUtility
, meaning how much slower it is? I will have a closer look soon.
@philbucher Well, all I can say is that all optimization tests are running. So the gradients I compute are the same. But I cant say if elsewhere there is a difference. I guess it makes sense, if the one who programmed the shell, quickly checks it. For sure its easy to say.
@armingeiser Running the shell example (test example 02) for one optimization step currently takes around 4.5s. Changing the implementation to the old one without ElementFiniteDifferenceUtility reduces it to 2s. Considering the correct flag settings as specified above reduces it further to 1.5s. And using the intel solver instead of super lu cuts it to 1.3s. And thats just a small example here.
@dbaumgaertner thanks, just that i know what i have to look for.
CalculateRightHandSide
for the unperturbed and the perturbed node. This leads to 6 calls per shape derivative.CalculateRightHandSide
which seems not to be needed and increases the number of function calls to 9.CalculateRightHandSide
once for the unperturbed system, and once for each coordinate direction - 4 calls per nodeThe time spent directly correlates to the number of calls to the CalculateRightHandSide
function.
What makes it even slower is the calculation of the dummy LHS in the CalculateRightHandSide
of the shell element. It needs to allocate a LHS matrix each time which makes this call very slow. If the mentioned flag is set to false:
End of the story:
But I cant say if elsewhere there is a difference. I guess it makes sense, if the one who programmed the shell, quickly checks it.
Thing is that they are no longer around I add it to my list of TODOS for cleaning the shell ...
@armingeiser This is just a reminder to make sure that also for other response functions (eigenfrequency) unnecessary Calculate-calls need to be avoided!
All currently available responses are not calling "Calculate" more often than necessary.
As gradient calculation invovles a lot of matrix-vector-operations it is assumed that this problem is related to the different blas and lapack routines or the given ublas.
Might vanish as soon as AMatrix is comming
ToDo: