Cantera / cantera

Chemical kinetics, thermodynamics, and transport tool suite
https://cantera.org
Other
624 stars 350 forks source link

Build fails with LTO due to type mismatch/ODR violations #1783

Open eli-schwartz opened 3 months ago

eli-schwartz commented 3 months ago

Problem description

I tried to build with the following *FLAGS to optimize the build: -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing

Link-Time Optimization is a massively global compiler optimization pass which is pretty handy for producing faster executables. It also has the interesting property that because the compiler does whole-program analysis using bytecode, it can save type information and perform error checks that it normally doesn’t have enough insight for. In particular, checking for ODR issues and checking function type signature mismatches.

Note that all the -Werror=* flags are used to help detect cases where the compiler tries to optimize by assuming UB cannot exist in the source code -- if it does exist, ordinarily the code would be miscompiled, and this says to make the miscompilation a fatal error.

I got this error:

x86_64-pc-linux-gnu-g++ -o build/python/cantera/_cantera.cpython-312-x86_64-linux-gnu.so -Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -Wl,--defsym=__gentoo_check_ldflags__=0 -pthread -shared -Wl,-rpath=/usr/lib64/cantera -Wl,-rpath=/usr/lib build/temp-py/_cantera.os build/temp-py/_onedim.os build/temp-py/_utils.os build/temp-py/constants.os build/temp-py/delegator.os build/temp-py/func1.os build/temp-py/kinetics.os build/temp-py/mixture.os build/temp-py/preconditioners.os build/temp-py/reaction.os build/temp-py/reactionpath.os build/temp-py/reactor.os build/temp-py/solutionbase.os build/temp-py/speciesthermo.os build/temp-py/thermo.os build/temp-py/transport.os build/temp-py/units.os build/temp-py/yamlwriter.os src/extensions/PythonExtensionManager.os -Lbuild/lib -L/usr/lib64/cantera -lcantera -lfmt -lgomp
src/fortran/cantera_thermo.f90:49:54: error: type of ‘th_newfromfile’ does not match original declaration [-Werror=lto-type-mismatch]
   49 |          self%thermo_id = th_newfromfile(filename, id)
      |                                                      ^
src/fortran/fct.cpp:368:13: note: type mismatch in parameter 3
  368 |     integer th_newfromfile_(char* filename, char* phasename, ftnlen lenf, ftnlen lenp)
      |             ^
src/fortran/fct.cpp:368:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:368:13: note: ‘th_newfromfile_’ was previously declared here
src/fortran/fct.cpp:368:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:66:46: error: type of ‘phase_getname’ does not match original declaration [-Werror=lto-type-mismatch]
   66 |       call phase_getname(self%thermo_id, name)
      |                                              ^
src/fortran/fct.cpp:101:14: note: return value type mismatch
  101 |     status_t phase_getname_(const integer* n, char* nm,
      |              ^
src/fortran/fct.cpp:101:14: note: type ‘status_t’ should match type ‘void’
src/fortran/fct.cpp:101:14: note: ‘phase_getname_’ was previously declared here
src/fortran/fct.cpp:101:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:123:68: error: type of ‘phase_elementindex’ does not match original declaration [-Werror=lto-type-mismatch]
  123 |       ctthermo_elementindex = phase_elementindex(self%thermo_id, nm)
      |                                                                    ^
src/fortran/fct.cpp:191:13: note: type mismatch in parameter 3
  191 |     integer phase_elementindex_(const integer* n, char* nm, ftnlen lennm)
      |             ^
src/fortran/fct.cpp:191:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:191:13: note: ‘phase_elementindex_’ was previously declared here
src/fortran/fct.cpp:191:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:130:68: error: type of ‘phase_speciesindex’ does not match original declaration [-Werror=lto-type-mismatch]
  130 |       ctthermo_speciesindex = phase_speciesindex(self%thermo_id, nm)
      |                                                                    ^
src/fortran/fct.cpp:201:13: note: type mismatch in parameter 3
  201 |     integer phase_speciesindex_(const integer* n, char* nm, ftnlen lennm)
      |             ^
src/fortran/fct.cpp:201:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:201:13: note: ‘phase_speciesindex_’ was previously declared here
src/fortran/fct.cpp:201:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:179:64: error: type of ‘phase_setmolefractionsbyname’ does not match original declaration [-Werror=lto-type-mismatch]
  179 |       self%err = phase_setmolefractionsbyname(self%thermo_id, x)
      |                                                                ^
src/fortran/fct.cpp:266:14: note: type mismatch in parameter 3
  266 |     status_t phase_setmolefractionsbyname_(const integer* n, char* x, ftnlen lx)
      |              ^
src/fortran/fct.cpp:266:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:266:14: note: ‘phase_setmolefractionsbyname_’ was previously declared here
src/fortran/fct.cpp:266:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:200:64: error: type of ‘phase_setmassfractionsbyname’ does not match original declaration [-Werror=lto-type-mismatch]
  200 |       self%err = phase_setmassfractionsbyname(self%thermo_id, y)
      |                                                                ^
src/fortran/fct.cpp:292:14: note: type mismatch in parameter 3
  292 |     status_t phase_setmassfractionsbyname_(const integer* n, char* y, ftnlen leny)
      |              ^
src/fortran/fct.cpp:292:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:292:14: note: ‘phase_setmassfractionsbyname_’ was previously declared here
src/fortran/fct.cpp:292:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:222:60: error: type of ‘phase_getspeciesname’ does not match original declaration [-Werror=lto-type-mismatch]
  222 |       self%err = phase_getspeciesname(self%thermo_id, k, nm)
      |                                                            ^
src/fortran/fct.cpp:327:14: note: type mismatch in parameter 4
  327 |     status_t phase_getspeciesname_(const integer* n, integer* k, char* nm, ftnlen lennm)
      |              ^
src/fortran/fct.cpp:327:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:327:14: note: ‘phase_getspeciesname_’ was previously declared here
src/fortran/fct.cpp:327:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:230:60: error: type of ‘phase_getelementname’ does not match original declaration [-Werror=lto-type-mismatch]
  230 |       self%err = phase_getelementname(self%thermo_id, m, nm)
      |                                                            ^
src/fortran/fct.cpp:342:14: note: type mismatch in parameter 4
  342 |     status_t phase_getelementname_(const integer* n, integer* m, char* nm, ftnlen lennm)
      |              ^
src/fortran/fct.cpp:342:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:342:14: note: ‘phase_getelementname_’ was previously declared here
src/fortran/fct.cpp:342:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:245:50: error: type of ‘th_geteostype’ does not match original declaration [-Werror=lto-type-mismatch]
  245 |       self%err = th_geteostype(self%thermo_id, nm)
      |                                                  ^
src/fortran/fct.cpp:379:13: note: type mismatch in parameter 3
  379 |     integer th_geteostype_(const integer* n, char* buf, ftnlen lenbuf)
      |             ^
src/fortran/fct.cpp:379:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:379:13: note: ‘th_geteostype_’ was previously declared here
src/fortran/fct.cpp:379:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_thermo.f90:501:119: error: type of ‘th_equil’ does not match original declaration [-Werror=lto-type-mismatch]
  501 |       self%err = th_equil(self%thermo_id, XY, trim(solver_), rtol_, max_steps_, max_iter_, estimate_equil_, log_level_)
      |                                                                                                                       ^
src/fortran/fct.cpp:566:14: note: type mismatch in parameter 9
  566 |     status_t th_equil_(const integer* n, char* XY, char* solver, double* rtol, int* max_steps, int* max_iter, int* estimate_equil, int* log_level, ftnlen lenxy, ftnlen lensolver)
      |              ^
src/fortran/fct.cpp:566:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:566:14: note: ‘th_equil_’ was previously declared here
src/fortran/fct.cpp:566:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_kinetics.f90:46:56: error: type of ‘kin_newfromfile’ does not match original declaration [-Werror=lto-type-mismatch]
   46 |                                          n1, n2, n3, n4)
      |                                                        ^
src/fortran/fct.cpp:679:13: note: type mismatch in parameter 8
  679 |     integer kin_newfromfile_(const char* filename, const char* phasename,
      |             ^
src/fortran/fct.cpp:679:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:679:13: note: ‘kin_newfromfile_’ was previously declared here
src/fortran/fct.cpp:679:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_kinetics.f90:58:45: error: type of ‘kin_gettype’ does not match original declaration [-Werror=lto-type-mismatch]
   58 |       self%err = kin_gettype(self%kin_id, nm)
      |                                             ^
src/fortran/fct.cpp:708:13: note: type mismatch in parameter 3
  708 |     integer kin_gettype_(const integer* n, char* buf, ftnlen buflen)
      |             ^
src/fortran/fct.cpp:708:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:708:13: note: ‘kin_gettype_’ was previously declared here
src/fortran/fct.cpp:708:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_kinetics.f90:73:77: error: type of ‘kin_speciesindex’ does not match original declaration [-Werror=lto-type-mismatch]
   73 |       ctkin_kineticsSpeciesIndex = kin_speciesindex(self%kin_id, name, phase)
      |                                                                             ^
src/fortran/fct.cpp:726:13: note: type mismatch in parameter 4
  726 |     integer kin_speciesindex_(const integer* n, const char* nm, const char* ph,
      |             ^
src/fortran/fct.cpp:726:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:726:13: note: ‘kin_speciesindex_’ was previously declared here
src/fortran/fct.cpp:726:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_kinetics.f90:98:58: error: type of ‘kin_phaseindex’ does not match original declaration [-Werror=lto-type-mismatch]
   98 |       ctkin_phaseindex = kin_phaseindex(self%kin_id, name)
      |                                                          ^
src/fortran/fct.cpp:766:13: note: type mismatch in parameter 3
  766 |     integer kin_phaseindex_(const integer* n, const char* ph, ftnlen lenph)
      |             ^
src/fortran/fct.cpp:766:13: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:766:13: note: ‘kin_phaseindex_’ was previously declared here
src/fortran/fct.cpp:766:13: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_kinetics.f90:122:57: error: type of ‘kin_getreactiontype’ does not match original declaration [-Werror=lto-type-mismatch]
  122 |       self%err = kin_getreactiontype(self%kin_id, i, buf)
      |                                                         ^
src/fortran/fct.cpp:793:14: note: type mismatch in parameter 4
  793 |     status_t kin_getreactiontype_(const integer* n, integer* i, char* buf, ftnlen lenbuf)
      |              ^
src/fortran/fct.cpp:793:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:793:14: note: ‘kin_getreactiontype_’ was previously declared here
src/fortran/fct.cpp:793:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_kinetics.f90:193:59: error: type of ‘kin_getreactionstring’ does not match original declaration [-Werror=lto-type-mismatch]
  193 |       self%err = kin_getreactionstring(self%kin_id, i, buf)
      |                                                           ^
src/fortran/fct.cpp:903:14: note: type mismatch in parameter 4
  903 |     status_t kin_getreactionstring_(const integer* n, integer* i, char* buf, ftnlen lenbuf)
      |              ^
src/fortran/fct.cpp:903:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:903:14: note: ‘kin_getreactionstring_’ was previously declared here
src/fortran/fct.cpp:903:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_funcs.f90:30:66: error: type of ‘trans_newdefault’ does not match original declaration [-Werror=lto-type-mismatch]
   30 |          self%tran_id = trans_newdefault(self%thermo_id, loglevel)
      |                                                                  ^
src/fortran/fct.cpp:960:13: note: type mismatch in parameter 3
  960 |     integer trans_newdefault_(integer* ith, integer* loglevel, ftnlen lenmodel)
      |             ^
src/fortran/fct.cpp:960:13: note: type ‘ftnlen’ should match type ‘void’
src/fortran/fct.cpp:960:13: note: ‘trans_newdefault_’ was previously declared here
src/fortran/cantera_funcs.f90:78:35: error: type of ‘ctgetcanteraerror’ does not match original declaration [-Werror=lto-type-mismatch]
   78 |       ierr = ctgetCanteraError(buf)
      |                                   ^
src/fortran/fct.cpp:1089:14: note: type mismatch in parameter 2
 1089 |     status_t ctgetcanteraerror_(char* buf, ftnlen buflen)
      |              ^
src/fortran/fct.cpp:1089:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:1089:14: note: ‘ctgetcanteraerror_’ was previously declared here
src/fortran/fct.cpp:1089:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
src/fortran/cantera_funcs.f90:90:59: error: type of ‘ctaddcanteradirectory’ does not match original declaration [-Werror=lto-type-mismatch]
   90 |       self%err = ctaddCanteraDirectory(self%thermo_id, buf)
      |                                                           ^
src/fortran/fct.cpp:1105:14: note: type mismatch in parameter 3
 1105 |     status_t ctaddcanteradirectory_(integer* buflen, char* buf)
      |              ^
src/fortran/fct.cpp:1105:14: note: type ‘void’ should match type ‘long int’
src/fortran/fct.cpp:1105:14: note: ‘ctaddcanteradirectory_’ was previously declared here
src/fortran/cantera_funcs.f90:68:68: error: type of ‘ctphase_report’ does not match original declaration [-Werror=lto-type-mismatch]
   68 |          self%err = ctphase_report(self%thermo_id, buf, show_thermo)
      |                                                                    ^
src/fortran/fct.cpp:1070:14: note: type mismatch in parameter 4
 1070 |     status_t ctphase_report_(const integer* nth,
      |              ^
src/fortran/fct.cpp:1070:14: note: type ‘ftnlen’ should match type ‘long int’
src/fortran/fct.cpp:1070:14: note: ‘ctphase_report_’ was previously declared here
src/fortran/fct.cpp:1070:14: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
lto1: some warnings being treated as errors
build/python/cantera/_cantera.cpp:2340:3: error: type ‘struct __pyx_mstate’ violates the C++ One Definition Rule [-Werror=odr]
 2340 | } __pyx_mstate;
      |   ^
lto-wrapper: fatal error: x86_64-pc-linux-gnu-gfortran returned 1 exit status
compilation terminated.
/usr/libexec/gcc/x86_64-pc-linux-gnu/ld: error: lto-wrapper failed
build/python/cantera/_onedim.cpp:5467:3: note: a different type is defined in another translation unit
 5467 | } __pyx_mstate;
      |   ^
build/python/cantera/_cantera.cpp:2265:13: note: the first difference of corresponding definitions is field ‘__pyx_n_s_’
 2265 |   PyObject *__pyx_n_s_;
      |             ^
build/python/cantera/_onedim.cpp:4614:17: note: a field with different name is defined in another translation unit
 4614 |   PyTypeObject *__pyx_GeneratorType;
      |                 ^
collect2: error: ld returned 1 exit status
scons: *** [build/lib/libcantera_fortran.so.3.0.1] Error 1
lto1: some warnings being treated as errors
lto-wrapper: fatal error: x86_64-pc-linux-gnu-g++ returned 1 exit status
compilation terminated.
/usr/libexec/gcc/x86_64-pc-linux-gnu/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
scons: *** [build/python/cantera/_cantera.cpython-312-x86_64-linux-gnu.so] Error 1
scons: done building targets (errors occurred during build).

Steps to reproduce

Compile with GCC.

Include the following CFLAGS / CXXFLAGS / LDFLAGS: -flto -Werror=odr -Werror=lto-type-mismatch

System information

Attachments

Full build log of the failure: build.log

ischoegl commented 3 months ago

Thanks for the issue report. All the examples above (which one exception for Python) appear to be from the Fortran interface. Cantera using mismatched types for integers in Fortran is something that is known (it may actually be a deliberate hack). Can you confirm that the

build/python/cantera/_onedim.cpp:5467:3: note: a different type is defined in another translation unit
 5467 | } __pyx_mstate;
      |   ^
build/python/cantera/_cantera.cpp:2265:13: note: the first difference of corresponding definitions is field ‘__pyx_n_s_’
 2265 |   PyObject *__pyx_n_s_;
      |             ^
build/python/cantera/_onedim.cpp:4614:17: note: a field with different name is defined in another translation unit
 4614 |   PyTypeObject *__pyx_GeneratorType;
      |       

block is the only one remaining if you compile without Fortran support? I.e. using the f90_interface=n option?

eli-schwartz commented 3 months ago

All the examples above (which one exception for Python) appear to be from the Fortran interface. Cantera using mismatched types for integers in Fortran is something that is known (it may actually be a deliberate hack).

Yeah, it does seem to show up in a lot of packages. :)

Sometimes they do get fixed after my report though -- in at least one case, that involved adopting the use of Fortran 2003 "BIND(C)". I don't really understand the details very well since I don't know Fortran. :( But e.g. the GCC docs talk about it as "the" facility for ensuring interoperability between C and Fortran interfaces when producing shared libraries. Maybe that's an option here?

eli-schwartz commented 3 months ago
$ USE=-fortran emerge -1 cantera

[...]

x86_64-pc-linux-gnu-g++ -o build/python/cantera/_cantera.cpython-312-x86_64-linux-gnu.so -Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -Wl,--defsym=__gentoo_check_ldflags__=0 -pthread -shared -Wl,-rpath=/usr/lib64/cantera -Wl,-rpath=/usr/lib build/temp-py/_cantera.os build/temp-py/_onedim.os build/temp-py/_utils.os build/temp-py/constants.os build/temp-py/delegator.os build/temp-py/func1.os build/temp-py/kinetics.os build/temp-py/mixture.os build/temp-py/preconditioners.os build/temp-py/reaction.os build/temp-py/reactionpath.os build/temp-py/reactor.os build/temp-py/solutionbase.os build/temp-py/speciesthermo.os build/temp-py/thermo.os build/temp-py/transport.os build/temp-py/units.os build/temp-py/yamlwriter.os src/extensions/PythonExtensionManager.os -Lbuild/lib -L/usr/lib64/cantera -lcantera -lfmt -lgomp
build/python/cantera/_cantera.cpp:2340:3: error: type ‘struct __pyx_mstate’ violates the C++ One Definition Rule [-Werror=odr]
 2340 | } __pyx_mstate;
      |   ^
build/python/cantera/_onedim.cpp:5467:3: note: a different type is defined in another translation unit
 5467 | } __pyx_mstate;
      |   ^
build/python/cantera/_cantera.cpp:2265:13: note: the first difference of corresponding definitions is field ‘__pyx_n_s_’
 2265 |   PyObject *__pyx_n_s_;
      |             ^
build/python/cantera/_onedim.cpp:4614:17: note: a field with different name is defined in another translation unit
 4614 |   PyTypeObject *__pyx_GeneratorType;
      |                 ^
lto1: some warnings being treated as errors
lto-wrapper: fatal error: x86_64-pc-linux-gnu-g++ returned 1 exit status
compilation terminated.
/usr/libexec/gcc/x86_64-pc-linux-gnu/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
scons: *** [build/python/cantera/_cantera.cpython-312-x86_64-linux-gnu.so] Error 1
scons: building terminated because of errors.

Can you confirm that the [...] block is the only one remaining if you compile without Fortran support? I.e. using the f90_interface=n option?

Yes, as long as the build is without fortran support it is just the one error.

ischoegl commented 3 months ago

Thanks for confirming!

speth commented 3 months ago

The warnings about __pyx_mstate are related to the somewhat unconventional way that we use Cython. To reduce compilation time, we compile multiple separate .pyx files, which Cython expects will end up as their own separate extension module .so, but we instead combine everything into a single extension module.

All usage of this struct seems to be guarded by the CYTHON_USE_MODULE_STATE macro, which is by default defined as 0, and is described in the documentation as an experimental feature. I don't know if this option will perhaps become a default at some point in the future, or what it would take to support our use case. Perhaps that's a question for the Cython devs.

eli-schwartz commented 3 months ago

The warnings about __pyx_mstate are related to the somewhat unconventional way that we use Cython. To reduce compilation time, we compile multiple separate .pyx files, which Cython expects will end up as their own separate extension module .so, but we instead combine everything into a single extension module.

That is a very interesting use case which does seem valuable. Another possibility from the cython side could be if they offered the ability to simply allow producing "_part1.cpp", "_part2.cpp" files when processing pyx files that produce especially large generated outputs. I would probably open a feature request if I were you. :)