OpenMDAO / testflo

A simple python testing framework that can run unit tests under MPI (or not).
Other
3 stars 7 forks source link

OpenMPI / ORTE Errors with Serial mpi4py Test #88

Open bernardopacini opened 1 year ago

bernardopacini commented 1 year ago

When running Testflo with a file that uses mpi4py I intermittently get the following errors:

ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/show_help.c at line 501

and / or

ORTE_ERROR_LOG: Out of resource in file util/show_help.c at line 501

Sometimes both pop up, sometimes neither, sometimes one.

I thought this was due to an issue in my code but after debugging I was able to make a minimum version that reproduces the error on my machine (see below). Interestingly, the test is even serial with no communication (it imports mpi4py but does not use it) and still gives the error. Running testflo -v -n 16 . gives me:

❯ testflo -v -n 16 .

[[34212,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/show_help.c at line 501
./test_model_python.py:Test_Model.test_initialize_run ... OK (00:00:0.00, 41 MB)

OK

Passed:  1
Failed:  0
Skipped: 0

Ran 1 test using 16 processes
Wall clock time:   00:00:0.24

Unfortunately it does not seem deterministic so this pops up once every 20 or so times I run the test. This hasn’t caused any of my tests to terminate or fail, but it seems strange regardless. Have you run into this before? Is there a known reason for why it happens?

For reference this is with: Ubuntu 22.04 Python 3.10.12 Testflo 1.4.12 Mpi4py 3.1.3 OpenMPI 3.1.6

Test file:

import unittest
import os
import sys

import package as py_model

class Test_Model(unittest.TestCase):
    def setUp(self):
        pass

    def tearDown(self):
        pass

    def test_initialize_run(self):
        # Write Data File
        f = open("test.dat", "w")
        f.write("3\n")
        f.write("0.0000000 0.0000000\n")
        f.write("0.5000000 1.0000000\n")
        f.write("1.0000000 0.0000000\n")
        f.write("0.0000000 0.0000000\n")
        f.write("0.5000000 -1.0000000\n")
        f.write("1.0000000 0.0000000\n")
        f.close()

if __name__ == "__main__":
    unittest.main()

Imported 'package.py':

from mpi4py import MPI
import numpy as np
bernardopacini commented 1 year ago

I am not positive, but this may be related as it mentions the same behavior and deals with sub-threads:

https://users.open-mpi.narkive.com/OntQX3As/ompi-mpi-spawn-error-data-unpack-would-read-past-end-of-buffer-26-instead-of-success