OrderN / CONQUEST-release

Full public release of large scale and linear scaling DFT code CONQUEST
http://www.order-n.org/
MIT License
96 stars 25 forks source link

Problem with starting Conquest compiled with GCC v13 #289

Closed davidbowler closed 4 months ago

davidbowler commented 5 months ago

If the module pseudo_tm_info.f90 is compiled with optimisation using GCC13 then the code seems to crash when ion files are read it. This can be fixed by compiling just that module without optimisation (mpif90 -c pseudo_tm_info.f90) but it suggests a problem with pointers or memory that should be investigated at some time.

davidbowler commented 4 months ago

The same sort of problem was reported by @connoraird in #301 though the failure was slightly later there. I will open a branch to fix this.

tkoskela commented 4 months ago

On my Ubuntu-22.04 machine with GNU Fortran (Ubuntu 13.1.0-8ubuntu1~22.04) 13.1.0 tests fail with

test_001_bulk_Si_1proc_Diag (develop)]$ mpirun -np 1 ../../bin/Conquest
hwloc/linux: Ignoring PCI device with non-16bit domain.
Pass --enable-32bits-pci-domain to configure to support such devices
(warning: it would break the library ABI, don't enable unless really needed).

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f7318389970 in ???
#1  0x7f7318388ad5 in ???
#2  0x7f731702c51f in ???
    at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x5557d9079271 in __atomic_density_MOD_make_atomic_density_from_paos
    at /home/tkoskela/git_repos/CONQUEST-release/src/atomic_density.f90:221
#4  0x5557d9079ae2 in __ionic_data_MOD_get_ionic_data
    at /home/tkoskela/git_repos/CONQUEST-release/src/ionic_data.f90:118
#5  0x5557d901c4f5 in __initialisation_MOD_initialise
    at /home/tkoskela/git_repos/CONQUEST-release/src/initialisation_module.f90:189
#6  0x5557d8ec5aa7 in conquest
    at /home/tkoskela/git_repos/CONQUEST-release/src/main.f90:132
#7  0x5557d8ec595e in main
    at /home/tkoskela/git_repos/CONQUEST-release/src/main.f90:68
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node rat exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Seems like a different symptom of the same issue as #301. It gets resolved by #302

davidbowler commented 4 months ago

Fixed by #302