Closed j507 closed 3 years ago
Although I would rate it unlikely, I could imagine is that there is some difference between 2D or 3D. The snapshots I have sent to you are from a 3D run. We are running the problem in 2D. One thing I could do is to dig up my old code and see whether the Coriolis acceleration was visible in my 2D run. I am gathering some snapshots of the old code right now to check this.
Other than that. you could add simple print when the Coriolis term is assembled just to check that it is active...
Other than that. you could add simple print when the Coriolis term is assembled just to check that it is active...
Yes, I had already checked this by adding
if (angular_velocity_vector_ptr != nullptr)
{
static unsigned int test = 0;
if (test == 0 && time_stepping.get_step_number() == 1)
{
*pcout << std::endl
<< "beta[0] = "
<< beta[0] << std::endl
<< "beta[1] = "
<< beta[1] << std::endl
<< "parameters.C1 = "
<< parameters.C1 << std::endl
<< "scratch.phi[i] = "
<< scratch.phi[i] << std::endl
<< "scratch.old_angular_velocity_value[0] = "
<< scratch.old_angular_velocity_value[0] << std::endl
<< "scratch.old_old_angular_velocity_value[0] = "
<< scratch.old_old_angular_velocity_value[0] << std::endl
<< "scratch.old_velocity_values[q] = "
<< scratch.old_velocity_values[q] << std::endl
<< "scratch.old_old_velocity_values[q] = "
<< scratch.old_old_velocity_values[q] << std::endl << std::endl;
test++;
}
if constexpr(dim == 2)
// The minus sign in the argument of cross_product_2d
// method is due to how the method is defined.
data.local_rhs(i) -=
....
which prints
beta[0] = 2.000000e+00
beta[1] = -1.000000e+00
parameters.C1 = 2.000000e+03
scratch.phi[i] = 4.723790e-01 0.000000e+00
scratch.old_angular_velocity_value[0] = 1.000000e+00
scratch.old_old_angular_velocity_value[0] = 1.000000e+00
scratch.old_velocity_values[q] = 2.935302e-03 9.304189e-06
scratch.old_old_velocity_values[q] = 0.000000e+00 0.000000e+00
This tubcloud folder contains all 2D simulations which I did previously. /Ekman_1e-3_recovered/explicit_dt_1e-5_longrun/
contains some results which are suitable for a comparison.
The field as seen in that folder, `/Ekman_1e-3_recovered/explicit_dt_1e-5_longrun/``coincide in form and magnitude with my results. When you ran your code in 3D, did your results behave as those from the Rayleigh solver?
I will upload the results of my 3D runs as well but this takes a while. In one of the runs the solution is rotating. Do you also see this rotation in your solutions?
After seeing your earlier 2D results, I ran a test which strongly indicates that there is indeed a difference between the 2D and 3D case. First row are the results of my latest 3D run, while the second row are the results in VeryShortRun
. The first, second and third column are the velocity magnitude, the pressure and the temperature respectively.
Relevant parameters
+------------------------------------------+----------------------+
| Problem parameters |
+------------------------------------------+----------------------+
| Problem type | rotating_boussinesq |
| Spatial dimension | 3 |
| Mapping | MappingQ<3>(2) |
| Mapping - Apply to interior cells | true |
| Finite Element - Velocity | FE_Q<3>(2)^3 |
| Finite Element - Pressure | FE_Q<3>(1) |
| Finite Element - Temperature | FE_Q<3>(2) |
| Verbose | false |
+------------------------------------------+----------------------+
| Dimensionless numbers |
+------------------------------------------+----------------------+
| Prandtl number | 1 |
| Rayleigh number | 100000 |
| Ekman number | 0.001 |
+------------------------------------------+----------------------+
| Timestepping parameters |
+------------------------------------------+----------------------+
| IMEX scheme | BDF2 |
| Maximum number of time steps | 10 |
| Adaptive timestepping | false |
| Initial time step | 0.0001 |
| Start time | 0 |
| Final time | 0.001 |
| Verbose | true |
+------------------------------------------+----------------------+
| Navier-Stokes discretization parameters |
+------------------------------------------+----------------------+
| Incremental pressure-correction scheme | rotational |
| Convective term weak form | skew-symmetric |
| Convective temporal form | semi-implicit |
| Preconditioner update frequency | 10 |
+------------------------------------------+----------------------+
| Linear solver parameters - Diffusion step |
+------------------------------------------+----------------------+
| Maximum number of iterations | 1000 |
| Relative tolerance | 1e-06 |
| Absolute tolerance | 1e-09 |
| Preconditioner | ILU |
| Fill-in level | 1 |
| Overlap | 1 |
| Relative tolerance | 1 |
| Absolute tolerance | 0 |
+------------------------------------------+----------------------+
| Linear solver parameters - Projection step |
+------------------------------------------+----------------------+
| Maximum number of iterations | 1000 |
| Relative tolerance | 1e-06 |
| Absolute tolerance | 1e-09 |
| Preconditioner | ILU |
| Fill-in level | 1 |
| Overlap | 1 |
| Relative tolerance | 1 |
| Absolute tolerance | 0 |
+------------------------------------------+----------------------+
| Linear solver parameters - Correction step |
+------------------------------------------+----------------------+
| Maximum number of iterations | 1000 |
| Relative tolerance | 1e-06 |
| Absolute tolerance | 1e-09 |
| Preconditioner | ILU |
| Fill-in level | 1 |
| Overlap | 1 |
| Relative tolerance | 1 |
| Absolute tolerance | 0 |
+------------------------------------------+----------------------+
| Linear solver parameters - Poisson pre-step |
+------------------------------------------+----------------------+
| Maximum number of iterations | 1000 |
| Relative tolerance | 1e-06 |
| Absolute tolerance | 1e-09 |
| Preconditioner | ILU |
| Fill-in level | 1 |
| Overlap | 1 |
| Relative tolerance | 1 |
| Absolute tolerance | 0 |
+------------------------------------------+----------------------+
+------------------------------------------+----------------------+
| Heat equation solver parameters |
+------------------------------------------+----------------------+
| Convective term weak form | skew-symmetric |
| Convective temporal form | semi-implicit |
| Preconditioner update frequency | 10 |
+------------------------------------------+----------------------+
| Linear solver parameters - Heat equation |
+------------------------------------------+----------------------+
| Maximum number of iterations | 1000 |
| Relative tolerance | 1e-06 |
| Absolute tolerance | 1e-09 |
| Preconditioner | ILU |
| Fill-in level | 1 |
| Overlap | 1 |
| Relative tolerance | 1 |
| Absolute tolerance | 0 |
+------------------------------------------+----------------------+
+----------+----------+----------+----------+----------+----------+
| C1 | C2 | C3 | C4 | C5 | C6 |
+----------+----------+----------+----------+----------+----------+
| 2.0e+03 | 1.0e+00 | 1.0e+05 | 1.0e+00 | 0.0e+00 | 1.0e+03 |
+----------+----------+----------+----------+----------+----------+
Triangulation:
Number of initial active cells = 132096
Spatial discretization:
Number of velocity degrees of freedom = 3880182
Number of pressure degrees of freedom = 179754
Number of temperature degrees of freedom = 1293394
Number of total degrees of freedom = 5353330
The verbose output of the last step
Heat Equation: Assembling advection matrix... done!
Heat Equation: Assembling right hand side... done!
Right-hand side's L2-norm = 1.233147e+02
Heat Equation: Solving linear system... done!
Number of GMRES iterations: 735, Final residual: 1.226188e-04.
Navier Stokes: Assembling advection matrix... done!
Navier Stokes: Assembling diffusion step's right hand side... done!
Right-hand side's L2-norm = 6.555920e+02
Navier Stokes: Solving the diffusion step... done!
Number of GMRES iterations: 553, Final residual: 6.479068e-04.
Navier Stokes: Assembling the projection step's right hand side... done!
Right-hand side's L2-norm = 9.725863e-02
Navier Stokes: Solving the projection step... done!
Number of CG iterations: 113, Final residual: 8.129867e-08.
Navier Stokes: Pressure correction step... done!
Number of CG iterations: 18, Final residual: 4.764454e-09.
There are three fronts to tackle:
VeryShortRun
I know that the meshes are not really comparable as they use different methods but its a reference still. Remember that the graphical output globally refines the mesh once. The actual mesh is one level coarser. Anyhow, with the current mesh each step takes 10 mins and with a time step of 1e-4
and a end time of 1.2
we are talking about 2000 hours. Hopefully a better tuning of the preconditioners reduces this considerably but I think we need more firepower if test cases are to be done in 3D.
WorkStream
approach. The last output file was created at 14:30 and the benchmarks computation are still running (It is 17:11 as I am writing this).The results look great! To be honest I didn't expect this. This is promising.
If you need 10 minutes for one time step 10,000 time step would take approx. 70 days which is too much. There are several options to reduce the runtime:
make release
.I don't understand your comment regarding the WorkStream
approach? We are using this approach already.
Could you make a short 3D run with 10 time steps such that we can see the runtime of the different sections?
The results look great! To be honest I didn't expect this. This is promising.
If you need 10 minutes for one time step 10,000 time step would take approx. 70 days which is too much. There are several options to reduce the runtime:
1. Compile in relase mode `make release`.
I keep forgetting that release
mode exits...
2. Switch to AMG preconditioning. In AMG preconditioning the number of iterations is constant for an increasing number of degrees of freedom whereas it is increasing for ILU preconditioning.
Will do
3. Possibly switch to purely explicit scheme because the advection matrix takes most of the time, right?
Most likely yeah. The benchmark data is still computing. I will update once I have the terminal output with the times spent in each section.
I don't understand your comment regarding the
WorkStream
approach? We are using this approach already.
The benchmark's class. The methods computing the mean velocity and such are not coded witih the WorkStream
approach. It had not be needed until now.
Could you make a short 3D run with 10 time steps such that we can see the runtime of the different sections?
This prm
file of this run was set for 10 steps. I'm currently uploading the results to my tubcloud
Edit: Link updated
Could you also provide the parameter file? Are you running on coriolis_plus_master
or coriolis_acceleration
?
I started a production on coriolis_acceleration
with the following input file:
# ---------------------
set FE's polynomial degree - Pressure (Taylor-Hood) = 1
set FE's polynomial degree - Temperature = 2
set Mapping - Apply to interior cells = true
set Mapping - Polynomial degree = 2
set Problem type = rotating_boussinesq
set Spatial dimension = 3
set Verbose = false
subsection Dimensionless numbers
set Ekman number = 1e-3
set Prandtl number = 1.0
set Rayleigh number = 1e5
end
subsection Heat equation solver parameters
set Convective term time discretization = semi-implicit
set Convective term weak form = skew-symmetric
set Preconditioner update frequency = 10
set Verbose = false
subsection Linear solver parameters
set Absolute tolerance = 1e-9
set Maximum number of iterations = 1000
set Relative tolerance = 1e-6
subsection Preconditioner parameters
set Preconditioner type = AMG
set Aggregation threshold = 1.0e-4
set Elliptic = true
set Number of cycles = 1
end
end
end
subsection Navier-Stokes solver parameters
set Convective term time discretization = semi-implicit
set Convective term weak form = skew-symmetric
set Incremental pressure-correction scheme = rotational
set Preconditioner update frequency = 10
set Verbose = false
subsection Linear solver parameters - Correction step
set Absolute tolerance = 1e-9
set Maximum number of iterations = 1000
set Relative tolerance = 1e-6
subsection Preconditioner parameters
set Preconditioner type = Jacobi
set Number of sweeps = 10
set Overlap = 1
set Relaxation parameter = 0.5
end
end
subsection Linear solver parameters - Diffusion step
set Absolute tolerance = 1e-9
set Maximum number of iterations = 1000
set Relative tolerance = 1e-6
subsection Preconditioner parameters
set Preconditioner type = AMG
set Aggregation threshold = 1.0e-4
set Elliptic = true
set Number of cycles = 1
end
end
subsection Linear solver parameters - Poisson pre-step
set Absolute tolerance = 1e-9
set Maximum number of iterations = 1000
set Relative tolerance = 1e-6
subsection Preconditioner parameters
set Preconditioner type = AMG
set Aggregation threshold = 1.0e-4
set Elliptic = true
set Number of cycles = 1
end
end
subsection Linear solver parameters - Projection step
set Absolute tolerance = 1e-9
set Maximum number of iterations = 1000
set Relative tolerance = 1e-6
subsection Preconditioner parameters
set Preconditioner type = AMG
set Aggregation threshold = 1.0e-4
set Elliptic = true
set Number of cycles = 1
end
end
end
subsection Output control parameters
set Graphical output directory = ChristensenResults/
set Graphical output frequency = 100
set Terminal output frequency = 10
end
subsection Refinement control parameters
set Adaptive mesh refinement = false
set Adaptive mesh refinement frequency = 100
set Fraction of cells set to coarsen = 0.3
set Fraction of cells set to refine = 0.03
set Maximum number of levels = 12
set Minimum number of levels = 0
set Number of initial adaptive refinements = 0
set Number of initial boundary refinements = 1
set Number of initial global refinements = 8
end
subsection Time stepping parameters
set Adaptive time stepping = false
set Adaptive timestepping barrier = 2
set Final time = 1.0
set Initial time step = 1e-4
set Maximum number of time steps = 10000
set Maximum time step = 7e-2
set Minimum time step = 2e-2
set Start time = 0.0
set Time stepping scheme = BDF2
set Verbose = true
end
Could you also provide the parameter file? Are you running on
coriolis_plus_master
orcoriolis_acceleration
?
I did not save that specific parameters file. Nonetheless, one should be able to reproduce it with the terminal output from my previous commentary.
I already deleted coriolis_plus_master
. Only coriolis_acceleration
is valid from here on now.
I started a production on
coriolis_acceleration
with the following input file:
@sebglane Do notice, there is an infinite loop in the benchmark request class when running in 3D! I am finishing up a couple of commits and will look for the loop afterwards.
I just did a test with a smaller 3D problem
Number of initial active cells = 8448
Spatial discretization:
Number of velocity degrees of freedom = 242046
Number of pressure degrees of freedom = 11158
Number of temperature degrees of freedom = 80682
Number of total degrees of freedom = 333886
and the benchmark request was computed with no trouble. I assumed there was an endless loop as on the run I refer in my previous commentary, hours went by without the benchmark finishing its computation. And I mean the benchmark was taking way longer than the actual time loop took.
I am still trying to decipher what is wrong with the benchmark request. I ran the same problem size in the Benz server with 10 cores and it does not finish computing the benchmark data, but when run with 5 cores and the code runs perfectly. Either the assembly process scales really badly or a command is not thread-safe? I will code the WorkStream
approach and hope it fixes this.
Update: The problem starts with 6 cores. I coded a terminal output
static unsigned int counting = 0;
std::cout << "Testing (" << Utilities::MPI::this_mpi_process(mpi_communicator) << ") :" << counting << std::endl;
counting++;
inside the cell iterator of the compute_global_data
method. All processors reach a count of 1407, i.e. Testing (1) :1407
, and they just stop. Nothing else gets prints and the program gets stuck.
Update: With the WorkStream approach implemented in 641da09 the benchmark seems to not get stuck anymore. I tested it with 10 processors on the benz server.
I started a production on
coriolis_acceleration
with the following input file:
What were the results of this run? I doubt the benchmark data was computed but the graphical output would be a first indication that everything is alright with the implementation.
I have deactivate all post-processing and only write vtk output. The run has finished and produced approx. 45 GB of data. A few snapshots are available at here.
I get a file not found error from the link
I moved the data to tubcloud/Shared/ChristensenResults/LongRun
. I hope you can access them.
They look good. The overall physics seems to be correctly reproduced. We need to proof for accuracy now. How long did the simulation take?
I think we can close this issue now.
It ran from Sunday to Wednesday on 16 MPI processes.
Please close this issue.
I will upload more Snapshots and the log files to the cloud.
The
Christensen.cc
file does not produce the results of the benchmark. The fields computing during the first step differ somewhat from the snapshots you provided. The first row belong to yourVeryShortRun
snapshots while the second row correspond to my latest run on my laptop. The columns are the velocity, the pressure and the temperature field respectively.At the 9th step one can already see the angular motion in the velocity field of
VeryShortRun
, while on mine it is nowhere to be seenOn the 5608th step of my latest simulation they look like this Although the distribution of the field does not really change after the 200th time step. It looks as if the Coriolis term is indeed present as one can (barely) appreciate in the pressure field but it is apparently too weak.