BYU-PRISM / GEKKO

GEKKO Python for Machine Learning and Dynamic Optimization
https://machinelearning.byu.edu
Other
573 stars 102 forks source link

linux crash when using remote=False #87

Closed talsaiag closed 3 years ago

talsaiag commented 4 years ago

Description:

When using Gekko on my mac - works perfectly. When using Gekko deployed inside a container on a Linux host (with same model, of course) - the spawned executable crashes and the python cant find the results.json.

Output

Error: free(): invalid pointer

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x6c278f
#1  0x6aacd0
#2  0x7f51ab57683f
#3  0x7f51ab5767bb
#4  0x7f51ab561534
#5  0x7f51ab5b8507
#6  0x7f51ab5bec19
#7  0x7f51ab5c042b
#8  0x477b95
#9  0x448f7f
#10  0x64627d
#11  0x40e4df
#12  0x41e84e
#13  0x42a916
#14  0x64807b
#15  0x43582a
#16  0x6715bc
#17  0x4026ec
#18  0x7f51ab56309a
#19  0x40275c
#20  0xffffffffffffffff

Error: 'results.json' not found. Check above for additional error details
# <stack trace of my program accessing a variable>
  File "/usr/local/lib/python3.7/site-packages/gekko/gk_operators.py", line 147, in __getitem__
    return self.value[key]
TypeError: 'int' object is not subscriptable

Appreciate the help, Tal

APMonitor commented 4 years ago

Are you able to post the problem that produces this error? It isn't a problem with Gekko but the underlying apm executable that produces the results.json file. There are still some outstanding issues with using the Chemicals library in Linux: https://github.com/BYU-PRISM/GEKKO/issues/75

talsaiag commented 4 years ago

I found the issue + workaround: I am using Gekko to solve a minimization problem using several constraints. I have an edge case which caused this constraint to be used:

m = Gekko(remote=False)
pieces = []
target_value = 0
m.Equation(m.sum(pieces) == target_value)
# a couple of other less relevant constraints and target function are set.

This in turn, crashed the apm executable (and led me to believe there is a problem). This happened 1 time on mac as well (it is not consistent..), most of the times it works as expected.

The assumed behavior is to simply ignore this constraint automatically (like it does sometimes on mac). As a workaround, I check whether len(pieces) > 0 or target_value != 0 before applying this constraint (thought I think this should be dealt with within the apm itself).

Where is the issue tracker for the executable itself? I can move this issue over there as it is not related to Gekko (correct me if I'm wrong).

Thanks, Tal

APMonitor commented 4 years ago

Thanks for posting a minimal example that reproduces the error. This is a good place to track the apm executable errors. I'll mark it as a bug.

APMonitor commented 3 years ago

Here is a complete script from your example:

from gekko import Gekko
m = Gekko(remote=True)
pieces = []
target_value = 0
m.Equation(m.sum(pieces) == target_value)
# a couple of other less relevant constraints and target function are set.
m.solve()

The solver now successfully completes with the following message:

Exception of type: TOO_FEW_DOF in file "IpIpoptApplication.cpp" at line 891:
 Exception message: status != TOO_FEW_DEGREES_OF_FREEDOM evaluated false: Too few degrees of freedom (rethrown)!

EXIT: Problem has too few degrees of freedom.

 An error occured.
 The error code is          -10

 ---------------------------------------------------
 Solver         :  IPOPT (v3.12)
 Solution time  :   3.299999982118607E-003 sec
 Objective      :   0.000000000000000E+000
 Unsuccessful with error code            0
 ---------------------------------------------------

 Creating file: infeasibilities.txt
 Use command apm_get(server,app,'infeasibilities.txt') to retrieve file
 @error: Solution Not Found
Traceback (most recent call last):
  File "C:\Users\johnh\Desktop\test.py", line 7, in <module>
    m.solve()
  File "C:\Python38\lib\site-packages\gekko\gekko.py", line 2179, in solve
    raise Exception(response)
Exception:  @error: Solution Not Found

Please reopen the issue if there are still other things that need to be addressed.

talsaiag commented 3 years ago

Thanks! looks good :) It would have been nice, if not too much trouble, to pass a more specific Exception such as TooFewDOF in this case (so that it could be handled programmatically).

In general:

class SolutionNotFound(Exception):
    pass
class TooFewDOF(SolutionNotFound):
    pass

so one could catch the general case SolutionNotFound, and any other special case.

APMonitor commented 3 years ago

Each solver has many output messages. Below is the list for IPOPT. Gekko supports 5 solvers with more plans to integrate new solvers. The exception list would be very long, but possible. Let me know if you'd like to help with that development. There is a return code from the solver that is reported in m.options.APPINFO link to documentation.

IPOPT Output Messages

Solve_Succeeded:

Console Message: EXIT: Optimal Solution Found.

This message indicates that Ipopt found a (locally) optimal point within the desired tolerances.

Solved_To_Acceptable_Level:

Console Message: EXIT: Solved To Acceptable Level.

This indicates that the algorithm did not converge to the "desired" tolerances, but that it was able to obtain a point satisfying the "acceptable" tolerance level as specified by the acceptable_tol options. This may happen if the desired tolerances are too small for the current problem.

Feasible_Point_Found:

Console Message: EXIT: Feasible point for square problem found.

This message is printed if the problem is "square" (i.e., it has as many equality constraints as free variables) and Ipopt found a feasible point.

Infeasible_Problem_Detected:

Console Message: EXIT: Converged to a point of local infeasibility. Problem may be infeasible.

The restoration phase converged to a point that is a minimizer for the constraint violation (in the ℓ1-norm), but is not feasible for the original problem. This indicates that the problem may be infeasible (or at least that the algorithm is stuck at a locally infeasible point). The returned point (the minimizer of the constraint violation) might help you to find which constraint is causing the problem. If you believe that the NLP is feasible, it might help to start the optimization from a different point.

Search_Direction_Becomes_Too_Small:

Console Message: EXIT: Search Direction is becoming Too Small.

This indicates that Ipopt is calculating very small step sizes and is making very little progress. This could happen if the problem has been solved to the best numerical accuracy possible given the current scaling.

Diverging_Iterates:

Console Message: EXIT: Iterates divering; problem might be unbounded.

This message is printed if the max-norm of the iterates becomes larger than the value of the option diverging_iterates_tol. This can happen if the problem is unbounded below and the iterates are diverging.

User_Requested_Stop:

Console Message: EXIT: Stopping optimization at current point as requested by user.

This message is printed if the user call-back method Ipopt::TNLP::intermediate_callback returned false.

Maximum_Iterations_Exceeded:

Console Message: EXIT: Maximum Number of Iterations Exceeded.

This indicates that Ipopt has exceeded the maximum number of iterations as specified by the option max_iter.

Maximum_CpuTime_Exceeded:

Console Message: EXIT: Maximum CPU time exceeded.

This indicates that Ipopt has exceeded the maximum number of CPU seconds as specified by the option max_cpu_time.

Restoration_Failed:

Console Message: EXIT: Restoration Failed!

This indicates that the restoration phase failed to find a feasible point that was acceptable to the filter line search for the original problem. This could happen if the problem is highly degenerate, does not satisfy the constraint qualification, or if your NLP code provides incorrect derivative information.

Error_In_Step_Computation:

Console Output: EXIT: Error in step computation (regularization becomes too large?)!

This messages is printed if Ipopt is unable to compute a search direction, despite several attempts to modify the iteration matrix. Usually, the value of the regularization parameter then becomes too large. One situation where this can happen is when values in the Hessian are invalid (NaN or Inf). You can check whether this is true by using the option check_derivatives_for_naninf.

Invalid_Option:

Console Message: (details about the particular error will be output to the console)

This indicates that there was some problem specifying the options. See the specific message for details.

Not_Enough_Degrees_Of_Freedom:

Console Message: EXIT: Problem has too few degrees of freedom.

This indicates that your problem, as specified, has too few degrees of freedom. This can happen if you have too many equality constraints, or if you fix too many variables (Ipopt removes fixed variables by default, see also the option fixed_variable_treatment).

Invalid_Problem_Definition:

Console Message: (no console message, this is a return code for the C and Fortran interfaces only.)

This indicates that there was an exception of some sort when building the IpoptProblem structure in the C or Fortran interface. Likely there is an error in your model or the main routine.

Unrecoverable_Exception:

Console Message: (details about the particular error will be output to the console)

This indicates that Ipopt has thrown an exception that does not have an internal return code. See the specific message for details.

NonIpopt_Exception_Thrown:

Console Message: Unknown Exception caught in Ipopt

An unknown exception was caught in Ipopt. This exception could have originated from your model or any linked in third party code. See also Ipopt::IpoptApplication::RethrowNonIpoptException.

Insufficient_Memory:

Console Message: EXIT: Not enough memory.

An error occurred while trying to allocate memory. The problem may be too large for your current memory and swap configuration.

Internal_Error:

Console: EXIT: INTERNAL ERROR: Unknown SolverReturn value - Notify IPOPT Authors.

An unknown internal error has occurred. Please notify the authors of Ipopt via the mailing list.
talsaiag commented 3 years ago

cool, so a generic thing I would imagine is that the string TOO_FEW_DOF (or something similar) will be passed to a generic python exception SolutionNotFound what do you think? is there an option to have a standard way of parsing the output and extracting this error-type-string into the python exception?

APMonitor commented 3 years ago

Each error code has a specific integer that is returned by the solver. We'd likely need to create a dictionary of error codes such as:

{0:'Success',1:'Maximum Iterations',2:'Unbounded Solution',3:'Too Few DOF',...}

This dictionary is different for each solver.

APMonitor commented 3 years ago

It would take some work for each solver but then we'd have a more meaningful error message rather than just "Solution Not Found".

talsaiag commented 3 years ago

that would be amazing you may close the issue (at least from my part)

APMonitor commented 3 years ago

Solver dependent error codes. Can be included in a future release.