firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
502 stars 158 forks source link

Docker error: [0]PETSC ERROR: Caught signal number 11 SEGV #3276

Open Ig-dolci opened 10 months ago

Ig-dolci commented 10 months ago

Describe the bug In the firedrake image which was pulled with docker pull firedrakeproject/firedrake, I executed python3 -c "import firedrake" after activate firedrake venv and had the petsc error message:

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[unset]: PMIU_write error; fd=-1 buf=:cmd=abort exitcode=59 message=application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
:
system msg for write_line failure : Bad file descriptor

Environment:

Suggestions to sort this out are welcome!

connorjward commented 10 months ago

I think Docker on Mac (especially M1/M2) is supposed to be very fragile since the containers weren't strictly built for it. @JDBetteridge will know more.

danshapero commented 10 months ago

I have the same thing happening on an intel desktop. There doesn't seem to be a problem with the firedrake-vanilla image though.

JDBetteridge commented 10 months ago

Must be one of our friends (or their dependencies):

        --documentation-dependencies \
        --netgen \
        --slepc \
        --tinyasm \
        --install femlium \
        --install gusto \
        --install icepack \
        --install irksome \
        --install thetis"

Can you systematically remove these packages from firedrake, or systematically add them to firedrake-vanilla to narrow down the cause?

Ig-dolci commented 10 months ago

Updates: I noticed this failure build by adding --netgen to firedrake-vanilla.

francesco-ballarin commented 10 months ago

FYI, when packaging for Colab I have to patch ngsolve with https://github.com/fem-on-colab/fem-on-colab/blob/main/ngsolve/patches/01-petsc-external-libs because otherwise it would install its own umfpack, mumps, etc, rather using those shipped with PETSc.

I don't think the issue you are seeing is fully related to this (because you only install netgen, and not ngsolve), but I would double check what extra files get installed with https://github.com/firedrakeproject/firedrake/blob/master/scripts/firedrake-install#L1428

cc @UZerbinati

Ig-dolci commented 10 months ago

Updates: firedrakeproject/firedrake image works in Ubuntu. I did not have this issue.

UZerbinati commented 10 months ago

@Ig-dolci so the on mac as soon as you install with --netgen flag, firedrake breaks ? This is quite interesting because netgen is installed via pip :thinking:

francesco-ballarin commented 10 months ago

From my (limited) understanding of netgen's CI, I think that pypi packages are generated from two different scripts:

so it may not be so surprising that the behavior on mac is different to the one on ubuntu.

Anyone with more pypi packaging experience is able to see any relevant difference between the two files?

Ig-dolci commented 9 months ago

Updates: I locally built a docker image with ubuntu and python (without firedrake) and executed pip install netgen-mesher==6.2.2304.post142.dev0. Next, I had the message Illegal instruction when executing python3 -c "import netgen". This appears to be a docker issue with rosetta binary translator (see this report, section Instruction sets, ARM, and x86 compatibility).

UZerbinati commented 9 months ago

This is quite interesting! Can you try building ngsolve from source? Or installing pip install netgen-mesher==6.2.2305 ? If this works maybe we can ask the NGSolve get in contact with the NGSolve dev team to see what is going wrong with the pip installer :)

Ig-dolci commented 9 months ago

pip install netgen-mesher==6.2.2305

I had the same message illegal instruction. FYI, the execution of python -v -c "import netgen" returns:

import _frozen_importlib # frozen
import _imp # builtin
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import '_io' # <class '_frozen_importlib.BuiltinImporter'>
import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
# installing zipimport hook
import 'time' # <class '_frozen_importlib.BuiltinImporter'>
import 'zipimport' # <class '_frozen_importlib.FrozenImporter'>
# installed zipimport hook
# /usr/lib/python3.10/encodings/__pycache__/__init__.cpython-310.pyc matches /usr/lib/python3.10/encodings/__init__.py
# code object from '/usr/lib/python3.10/encodings/__pycache__/__init__.cpython-310.pyc'
# /usr/lib/python3.10/__pycache__/codecs.cpython-310.pyc matches /usr/lib/python3.10/codecs.py
# code object from '/usr/lib/python3.10/__pycache__/codecs.cpython-310.pyc'
import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffffbd3040>
# /usr/lib/python3.10/encodings/__pycache__/aliases.cpython-310.pyc matches /usr/lib/python3.10/encodings/aliases.py
# code object from '/usr/lib/python3.10/encodings/__pycache__/aliases.cpython-310.pyc'
import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9c8610>
import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffffbd2e00>
# /usr/lib/python3.10/encodings/__pycache__/utf_8.cpython-310.pyc matches /usr/lib/python3.10/encodings/utf_8.py
# code object from '/usr/lib/python3.10/encodings/__pycache__/utf_8.cpython-310.pyc'
import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffffbd2da0>
import '_signal' # <class '_frozen_importlib.BuiltinImporter'>
# /usr/lib/python3.10/__pycache__/io.cpython-310.pyc matches /usr/lib/python3.10/io.py
# code object from '/usr/lib/python3.10/__pycache__/io.cpython-310.pyc'
# /usr/lib/python3.10/__pycache__/abc.cpython-310.pyc matches /usr/lib/python3.10/abc.py
# code object from '/usr/lib/python3.10/__pycache__/abc.cpython-310.pyc'
import '_abc' # <class '_frozen_importlib.BuiltinImporter'>
import 'abc' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9c8970>
import 'io' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9c8760>
# /usr/lib/python3.10/__pycache__/site.cpython-310.pyc matches /usr/lib/python3.10/site.py
# code object from '/usr/lib/python3.10/__pycache__/site.cpython-310.pyc'
# /usr/lib/python3.10/__pycache__/os.cpython-310.pyc matches /usr/lib/python3.10/os.py
# code object from '/usr/lib/python3.10/__pycache__/os.cpython-310.pyc'
# /usr/lib/python3.10/__pycache__/stat.cpython-310.pyc matches /usr/lib/python3.10/stat.py
# code object from '/usr/lib/python3.10/__pycache__/stat.cpython-310.pyc'
import '_stat' # <class '_frozen_importlib.BuiltinImporter'>
import 'stat' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9cab90>
# /usr/lib/python3.10/__pycache__/_collections_abc.cpython-310.pyc matches /usr/lib/python3.10/_collections_abc.py
# code object from '/usr/lib/python3.10/__pycache__/_collections_abc.cpython-310.pyc'
import '_collections_abc' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9caef0>
# /usr/lib/python3.10/__pycache__/posixpath.cpython-310.pyc matches /usr/lib/python3.10/posixpath.py
# code object from '/usr/lib/python3.10/__pycache__/posixpath.cpython-310.pyc'
# /usr/lib/python3.10/__pycache__/genericpath.cpython-310.pyc matches /usr/lib/python3.10/genericpath.py
# code object from '/usr/lib/python3.10/__pycache__/genericpath.cpython-310.pyc'
import 'genericpath' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffffa05720>
import 'posixpath' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9caf80>
import 'os' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9c97b0>
# /usr/lib/python3.10/__pycache__/_sitebuiltins.cpython-310.pyc matches /usr/lib/python3.10/_sitebuiltins.py
# code object from '/usr/lib/python3.10/__pycache__/_sitebuiltins.cpython-310.pyc'
import '_sitebuiltins' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9ca890>
Processing global site-packages
Adding directory: '/myenv/lib/python3.10/site-packages'
Processing .pth file: '/myenv/lib/python3.10/site-packages/distutils-precedence.pth'
Processing user site-packages
Processing global site-packages
Adding directory: '/myenv/lib/python3.10/site-packages'
Processing .pth file: '/myenv/lib/python3.10/site-packages/distutils-precedence.pth'
# /usr/lib/python3.10/__pycache__/sitecustomize.cpython-310.pyc matches /usr/lib/python3.10/sitecustomize.py
# code object from '/usr/lib/python3.10/__pycache__/sitecustomize.cpython-310.pyc'
import 'sitecustomize' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffffa05c90>
import 'site' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffff9c9180>
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
# /myenv/lib/python3.10/site-packages/netgen/__pycache__/__init__.cpython-310.pyc matches /myenv/lib/python3.10/site-packages/netgen/__init__.py
# code object from '/myenv/lib/python3.10/site-packages/netgen/__pycache__/__init__.cpython-310.pyc'
# /myenv/lib/python3.10/site-packages/netgen/__pycache__/config.cpython-310.pyc matches /myenv/lib/python3.10/site-packages/netgen/config.py
# code object from '/myenv/lib/python3.10/site-packages/netgen/__pycache__/config.cpython-310.pyc'
import 'netgen.config' # <_frozen_importlib_external.SourceFileLoader object at 0x7fffffa06770>
Illegal instruction
ludgerpaehler commented 6 months ago

Currently running into the same issue, the main issue here is that Netgen does not provide a wheel for manylinux, with an arm64 build. I think for it to work properly it needs to be installed specifically.

BTW I hit the same error on VTK, has anyone else hit that?

ludgerpaehler commented 6 months ago

Opened an issue for it in the Netgen forum:

https://forum.ngsolve.org/t/installation-on-linux-arm64/2718