etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
http://www.itkp.uni-bonn.de/~urbach/software.html
GNU General Public License v3.0
32 stars 47 forks source link

qbig: Blocking error #574

Closed simone-romiti closed 10 months ago

simone-romiti commented 1 year ago

@kostrzewa Testing the ndd correlators omeas implementation. I experience the following:

/hiskp4/romiti/tmLQCD_runs/ndd_correlators/jobscript/logs/test-nd_correlators.173765.*

kostrzewa commented 1 year ago

Can you reproduce some of the error message here?

simone-romiti commented 1 year ago

FATAL ERROR Within _setQudaMultigridParam (reported by node 3): Blocking error.

FATAL ERROR Within _setQudaMultigridParam (reported by node 1): Blocking error.


MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1.

kostrzewa commented 1 year ago

Your title, "Blocking error", suggests that you have a configuration for an MG setup in there. The MG setup of course needs to be such that the block sizes and the volume per MPI task fit together.

kostrzewa commented 1 year ago

If you're testing with a 24c48 lattice, for example, then a

MGBlockSizesX = 4,3
MGBlockSizesY = 4,3
MGBlockSizesZ = 2,3
MGBlockSizesT = 3,2

blocking might be an option which works (assuming you're on a single node and have partitioned in T and Z only in a 4x2 grid (4 in T, 2 in Z) such that you have 12 fine grid points in the T and Z directions per MPI task.

The number of lattice points in each dimension at each level needs to be even and the size of the 4D blocks needs to be even as well.