SeisSol / Training

BSD 3-Clause "New" or "Revised" License
11 stars 11 forks source link

SeisSol crashes with `Illegal instruction` #24

Closed Thomas-Ulrich closed 1 year ago

Thomas-Ulrich commented 1 year ago

We've tried to run the training docker with Manel Prada at Barcelona, and we got unexpected errors when running SeisSol. We used: docker pull seissol/training:pr-23

cpuinfo:

processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 42
model name    : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping    : 7
microcode    : 0x2f
cpu MHz        : 3210.242
cache size    : 6144 KB
physical id    : 0
siblings    : 4
core id        : 0
cpu cores    : 4
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips    : 6219.84
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management:

Error: Illegal instruction (core dumped)

krenzland commented 1 year ago

Training is compiled for Haswell: https://github.com/SeisSol/Training/blob/main/Dockerfile#L111 The CPU is Sandy bridge: https://www.intel.de/content/www/de/de/products/sku/52207/intel-core-i52400-processor-6m-cache-up-to-3-40-ghz/specifications.html which came before Haswell -> The CPU is too old for the compile settings (snb compile setting should work though but is a lot slower than hsw for non-ancient cpus)

Thomas-Ulrich commented 1 year ago

I see.

  1. Would it make sense to have an exe compiled with Sandy Bridge in the docker?
  2. Or let's say a set or architectures?
  3. Could we then detect the architecture within the docker and use the correct exe?
krenzland commented 1 year ago

The easiest fix is to compile for ARCH=snb. This is going to lead to a significant performance loss on more modern architectures (=everything released since 2013...). The libxsmm_JIT backend does the compilation of some kernels during runtime but of course not all of them. Not sure how well it performs on sandy bridge.

Your idea is possible but a lot of work for a very small minority of the users. Even the Haswell architecture (for which we are compiling) is ~9 years old. Most people in computational science update their computers more often than that...

Thomas-Ulrich commented 1 year ago

ok fair enough. But we will need to do that anyway for the new Mac architecture at some point.

krenzland commented 1 year ago

The MAC architecture is far more complicated, as we need to have a multi-arch docker container (not only multi-arch seissol).