cgg-bern / AlgoHex

GNU Affero General Public License v3.0
21 stars 7 forks source link

Issue while running AlgoHex #9

Open otaolafranc opened 11 months ago

otaolafranc commented 11 months ago

Hello, I am having issues while testing algoHex, I just meshed a simple toroid geometry and the HexMeshing is crashing, here it is possible to find the log logAlgoHex.txt and the mesh file test2.vtk.tar.gz best regards, Franco

mheistermann commented 11 months ago

Relevant exerpt for quick reference:

*** The MPI_Comm_f2c() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.

My guess is that this may be caused by a mix of libraries with "real" and "fake" MPI, leading us to call MPI_Init of the fake MPI, and the real MPI is then never initialized.

Could you please run ldd on the HexMeshing library and post the output?

otaolafranc commented 11 months ago

Relevant exerpt for quick reference:

*** The MPI_Comm_f2c() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.

My guess is that this may be caused by a mix of libraries with "real" and "fake" MPI, leading us to call MPI_Init of the fake MPI, and the real MPI is then never initialized.

Could you please run ldd on the HexMeshing library and post the output? lddOutput.txt

here you can find the output of it.lddOutput.txt dbommes mentionned something about but not sure how to removed from the cmake configuration. https://github.com/cgg-bern/AlgoHex/issues/7#issuecomment-1818797514 I imagine that "real" mpi comes from other libraries that I have in the pc that use it.

mheistermann commented 11 months ago

You have libdmumps_seq, which has the Fake MPI package (cf. description at https://packages.debian.org/bullseye/libmumps-seq-dev). (Note: Tuxedo OS apparently is based on Ubuntu, in turn based on Debian).

I'm not sure where the real MPI comes from, maybe CoMISo directly links to it. There is an undocumented DISABLE_MPI CMake option in CoMISo which you can try: Run cmake -DDISABLE_MPI=TRUE . in the build folder, then compile again.

If this isn't enough, you could you try to use lddtree or tldd on HexMeshing? It will show where MPI is actually pulled in. (cf https://stackoverflow.com/questions/1488527/hierarchical-ldd1 - in Debian, lddtree is available in the package pax-utils, hope you have the same on your system!

otaolafranc commented 11 months ago

I'm not sure where the real MPI comes from, maybe CoMISo directly links to it. There is an undocumented DISABLE_MPI CMake option in CoMISo which you can try: Run cmake -DDISABLE_MPI=TRUE . in the build folder, then compile again.

Hello, with this option the cmake fails during the build. here is the log file logAlgoHexBuild.txt

mheistermann commented 11 months ago

Could you try to comment out that MPI_Init call from HexMeshing/main.cc? Seems we need to make this conditional.

EDIT: I made a test branch for this: https://github.com/cgg-bern/AlgoHex/tree/dev/conditional-mpi-init (haven't properly tested it myself yet)

otaolafranc commented 11 months ago

Could you try to comment out that MPI_Init call from HexMeshing/main.cc? Seems we need to make this conditional.

EDIT: I made a test branch for this: https://github.com/cgg-bern/AlgoHex/tree/dev/conditional-mpi-init (haven't properly tested it myself yet)

hello, i just finished compiling using cmake -DDISABLE_MPI=TRUE . and then cmake --build . (the two runned from build folder) still the same error:

The MPI_Comm_f2c() function was called before MPI_INIT was invoked. This is disallowed by the MPI standard. *** Your MPI job will now abort. [franco-precision7560:19456] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

ps. obviously I modified the main.cc file as the one that you added the link

mheistermann commented 11 months ago

This is a pretty terrible rabbithole :-(

I just looked into Debian's ipopt, which always uses the fake-mpi mumps version; it's really ancient and doesn't really appear maintained :( Someone has posted a patch here to solve a similar issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929122

Independent of fixing this crash issue, I'm unsure how we should handle this. Maybe we could do some hacks on the build system side to ensure we use the real MPI_Init symbol instead of the fake one; on the other hand, that'd just be for this debian-patched 9-year old ipopt version, which we probably actually don't want to use at all.

Aside from asking users to compile their own ipopt&mumps version manually, or integrating it into the AlgoHex build process (probably not trivial), we could maybe build a static binary in a docker container that brings known-good and compatible versions of dependencies.

I'm sorry I don't have a good solution for now. How comfortable would you be with compiling your own IPOPT, or moving everything into a container (e.g. docker or podman)? It appears Arch Linux packages a recent ipopt version, though I haven't tested it.

EDIT: I should note that some of our team have run AlgoHex on Debian before, but I can't tell what's the difference that makes it work. I'll ask around.

otaolafranc commented 11 months ago

I'm sorry I don't have a good solution for now. How comfortable would you be with compiling your own IPOPT, or moving everything into a container (e.g. docker or podman)? It appears Arch Linux packages a recent ipopt version, though I haven't tested it.

hello, thanks for your help, I am sorry but I not so comfortable to compile IPOPT, I could give a try when having a little bit more of time, but UNIX it is not a platform I feel confortable in general (as you could see from my dummy questions).

EDIT: I should note that some of our team have run AlgoHex on Debian before, but I can't tell what's the difference that makes it work. I'll ask around.

I will probably wait for this.... it is a shame as I was looking for new opensource hexa meshers...will keep an eye on this issue and further watch the git itself

we could maybe build a static binary in a docker container that brings known-good and compatible versions of dependencies.

this could be nice to test the algo itself... i have no experience with docker or anything else of this style but if it allows me to test i can test :)

mheistermann commented 11 months ago

I started hacking together some container based build, but I'm running into some pretty annoying issues, with both GCC and Clang crashing 😱

I'll have to pause work on this for now unfortunately, but I pushed those work-in-progress changes just in case someone wants to fix/finish it :)

mheistermann commented 11 months ago

An update on the container build: The compiler crashes apparently were just due too little RAM in the default config :-) I based the container on Debian, but used coinbrew to install a recent ipopt version. Maybe if you uninstall your OS version of ipopt and mumps, and use coinbrew for ipopt, it'll work for you just as well.

Note: I made some changes to the main branch that are required for this to work.

mkdir -p /usr/src/coin-or && \
    mkdir -p /opt/coin-or && \
    cd /usr/src/coin-or && \
    wget https://raw.githubusercontent.com/coin-or/coinbrew/master/coinbrew \
    && chmod +x coinbrew
cd /usr/src/coin-or && ./coinbrew fetch Ipopt@3.14.13
cd /usr/src/coin-or && ./coinbrew build Ipopt@3.14.13 --prefix=/opt/coin-or --parallel-jobs 8
export IPOPT_HOME=/opt/coin-or

that last export sets the environment variable required for cmake, so you'll have to use the same shell (or always set IPOPT_HOME before running cmake).

Hope this helps :-)

otaolafranc commented 11 months ago

Maybe if you uninstall your OS version of ipopt and mumps, and use coinbrew for ipopt, it'll work for you just as well.

emm this might affect the system itslef no? I mean, i am scary that something else stops working If I do this...

@mheistermann hello again martin, I can confirm that in principle, with the steps you mentionned AlgoHex 'works'. (see log of running ./build/Build/bin/HexMeshing -i ./demo/HexMeshing/cylinder.ovm -o test.ovm > cylinderLog.txt 2>&1 on AlgoHex folder): cylinderLog.txt

but.... my issue is, how to open this file/convert it to other formats.... I looked at Meshio but it does not support ovm format and tried to open with paraview or gmsh, without luck... furthermore, I tried open it in hexaLab.net and I get the error can't parse file. thanks in advance

mheistermann commented 11 months ago

Hi Franc, Please excuse the late response, I was sick the last few days and am still recovering (but now good enough again to answer some questions :).

I'm very happy that you could successfully run AlgoHex! The container work inspired by your troubles will hopefully make it easier for other users soon, so thanks for your working through this!

Regarding the .ovm file format:

There is a (inactive) meshio fork I made a long time ago with incomplete work to support the ovm file format: the main limitation in this context is that properties and thus feature tags are not translated. You should however get the proper mesh geometry and topology. For further reference, here is a Draft PR with some more detail on the limitations, both imposed by the file format and by the current implementation.

A more complete option is using OpenFlipper, it is built on OpenVolumeMesh and natively supports the .ovm format. It can also export to .vtk.

Ideally we would extend OVM with more file formation implementations (OpenFlipper uses its own implementation for .vtk, which could maybe be moved), then AlgoHex or a standalone conversion helper tool could provide VTK/.mesh/.msh/... support without users having to go through these hoops. However I unfortunately cannot give any time estimate for this to happen.

otaolafranc commented 11 months ago

Hello Martin, thanks again for your help. (and patience)

Please excuse the late response, I was sick the last few days and am still recovering (but now good enough again to answer some questions :).

please, you are taking time to help me nothing to excuse.

I'm very happy that you could successfully run AlgoHex! The container work inspired by your troubles will hopefully make it easier for other users soon, so thanks for your working through this! Regarding the .ovm file format

well, the thing is, I am not sure that it works :/ as i couldnt succesfully open the mesh with anything.... i though that HexLab was going to support it... right now I am trying to compile OpenFlipper (lets see if I achieve to compile it lol) and then I will check. from AlgoHex it is not possible to export in vtk directly, no? furthermore if I might, for the cases where the algorithm is not succesfull, woulnt be possible to make small tetra/pyramid zone? (something similar with what loopycuts does https://github.com/mlivesu/LoopyCuts) I see a big possibility for algoHex (from the images of the meshes i could see in the publication) for CFD applications, and in CFD we 'prefere' hexas but it is ok to have cells of different kinds... and to be adopted the two main limitations of algoHex would be (specially this) the gurobi dependency and the no output in case that the meshing is not succesfull (where it would be better to at least get a mesh with mix element and hexa dominant). is the algoHex developments stoped or is something that is still in place and in development? thanks for your answers and your patience, best regards

EDIT/UPDATE: I have succesfully compile OpenFlipper and I opened the mesh generated from the cylinder (./build/Build/bin/HexMeshing -i ./demo/HexMeshing/cylinder.ovm -o test.ovm) and the mesh is missing some elements (see photo) there is a 'hole' on the cylinder:

image
mheistermann commented 11 months ago

Hi Franc, Glad we're making steps in the right direction and you already got something out and can open it. Can OpenFlipper export in a format that's usable to you?

The output however is still not the expected one for this example input, which should be a complete and clean hex mesh. Looking at the logs, it appears no quantization constraints were generated, but I can't tell why. Often this is due to some gurobi license issue, but I can't see any output indicating that. I asked a colleague who is more familiar with the code to look into it, we'll get back to you :)

More general points:

mheistermann commented 11 months ago

Apparenly quite a few components don't log by default, including the one that likely is failing (our bet is on some gurobi license issue). We will change the default log level, but until then, could you manually enable the following settings in cmake (e.g. via cmake-gui; there you'll have to click both configure and generate after changing the settings), compile and run again?

algohex-cmake-logging-options

Another issue we noted is that that same failure is not handled, with the fallback giving you the bad result. This we'll of course also fix :)

otaolafranc commented 11 months ago

Hello martin, thanks again for your help. sorry that I didnt answered you before, I am pretty sure I found the issue, and yes it is the license of gurobi. i acutally simply extracted and did nothing else. I have tried to compile AlgoHex in my work pc and did not have succesfull in doint it. nevertheless when I go home I have the 'working' algoHex in my other pc, I will try to compile gurobi and re run it (add the license). regarding your answers:

  • Absolutely not, AlgoHex development is very active, just unfortunately most of it cannot happen in public: As an academic endeavour we need to publish our research, so any major updates will come out together with the corresponding paper as not to jeopardise peer review anonymity. We're of course still happy to create and publish fixes etc in this public version, especially to make sure users can actually test everything that should work.

I completly understand, I am also a researcher so no worries (and yes if we make publications using it I will cite it :) ). I was asking as I have found several promising tools that they stop their development with missing features...

-Regarding tet/pyramid filling: As far as I understand, this is currently not planned, instead focusing resources on making the pipeline reliable enough to always get full-hex results.

might not be of interest to the research team I can understand it, but just for keeping in mind if at one moment it could be possible done, it would be absolutly awsome to have this feature, from a applicative point of view. most of the CFD solvers today accepts polyhedral mixed meshes, so it would mean to bring the reliability of the mesher to octree algos with a much more interesting meshes.

-I do have good news however regarding the Gurobi dependency: There currently is work on the way to optionally use an open-source solver that should be available in the near future.

that is actually an amazing news, I mentioned in another publication, but I make part of the OpenFOAM community which is one of the largest CFD open source toolkit and a mesher like this would be an amazing tool to add to it.

EDIT: I just tested in my pc, and gurobi academic support only gives you acces to 1 pc. I imagine that is due to that but in any case it didnt made any differece :/ best regards

mheistermann commented 11 months ago

EDIT: I just tested in my pc, and gurobi academic support only gives you acces to 1 pc. I imagine that is due to that but in any case it didnt made any differece :/

You should be able to request more licenses on their website, one per computer.

otaolafranc commented 11 months ago

EDIT: I just tested in my pc, and gurobi academic support only gives you acces to 1 pc. I imagine that is due to that but in any case it didnt made any differece :/

You should be able to request more licenses on their website, one per computer.

hello martin, so from the university i succesfully generate a second key for gurobi and make what you mentionned here:

We will change the default log level, but until then, could you manually enable the following settings in cmake (e.g. via cmake-gui; there you'll have to click both configure and generate after changing the settings), compile and run again?

then, I used cmake -DGUROBI_HOME=../../gurobi1003/linux64/ .. in the build folder, and after make here you can find the two logs: logcMake.txt logmake.txt

but now the HexMeshing does not run... I am getting a segmentation error at the end (here is the output of it) logHex.txt best regards

mheistermann commented 11 months ago

@HendrikBrueckler this crash looks to be in your code, do you maybe have a chance to look into it? 0-pointer deref in qgp3d::ConstraintExtractor::determineEquivalentEndpoints()

mheistermann commented 9 months ago

@otaolafranc have you seen @hendrikBrueckler has added some non-gurobi solver support here? https://github.com/HendrikBrueckler/QGP3D/commit/e218c98463a596cfccec4f2689ef0d781a1f6da7

(Note: I haven't tried it and haven't checked how to use it)

HendrikBrueckler commented 9 months ago

Sorry for taking so long to get around to fixing this. I think I've found the cause of the issue and fixed it in https://github.com/HendrikBrueckler/QGP3D/commit/ace72a30ed21672235f9868b42c858db273445d2 . Feel free to verify whether that fixes the issue for you too @otaolafranc :)

otaolafranc commented 8 months ago

Hello, I just 'succesfully' compiled gurobi-less algoHex, nevertheless it is not passing the test with the cylinder. for the installation of the gurobi-less algoHex i followed the instructions in the dockerFile of the dev/dockerfile-without-gurobi using the following script (I needed to add some extra dependencies) Support Gurobi-less quantization this arrived to create the application, nevertheless if I run the test over the cylinder: ./build/Build/bin/HexMeshing -i ./demo/HexMeshing/cylinder.ovm -o test.ovm I am getting a segmentation fault here is the complete log: testCylinder.log in hoping that helps, here are the logs of the cmake and ninja commands during the compilation of algoHex: cmakeLog.txt ninjaLog.txt

regards

HendrikBrueckler commented 8 months ago

Hi, I tried to reproduce that issue but could not. On my machine using the dev/dockerfile-without-gurobi branch, the program finishes without issues and I get the following log: log.txt

Maybe you could try adding the flags -DTS3D_ENABLE_LOGGING=On -DMC3D_ENABLE_LOGGING=On -DQGP3D_ENABLE_LOGGING=On to cmake when building as this will add more log messages in those parts of the algorithm.

otaolafranc commented 8 months ago

hello @HendrikBrueckler , thanks for your answer, and really sorry for be such a bother to you guys.... I just really want to use algoHex :/. I just recompiled everything with the flags you named in the cmake in my other pc and it worked. I will test tomorrow at work in my work pc. the issue is that I copy paste my script that I was also copy pasting before....so no idea why it would work in one and not in the other.... next step to figure out how i can create my own meshes and read .ovm file to write it in another format, thanks!

EDIT: I have tested in my other computer (the one where it was not working, cleanned the different created folders, and change the linking of libraries from -s to -sf to overwrite them in case that it was an issue of other versions when I tried the gurobi version etc) still not arriving to run in cylinder.ovm. here is the log.
log.test.txt

chiefenne commented 8 months ago

Hi all,

just wanted to let you know that I was able to successfully run the "dockerfile-without-gurobi" branch. My operating system is macOS Sonoma 14.2.1.

Download command: git clone --single-branch --branch dev/dockerfile-without-gurobi https://github.com/cgg-bern/AlgoHex.git

Build command (at first cd into folder AlgoHex): docker build -t algohex .

Run the cylinder example (for this to work the cylinder.ovm needs to be copied to the AlgoHex folder): docker run -v .:/app/data -w /app/data --name algohex-container algohex HexMeshing -i cylinder.ovm -o test.ovm

Result (test.ovm mesh loaded in OpenFlipper): cylinder_HEX

Log file from running HexMeshing: cylinder.log.gz

Andi