TimefoldAI / timefold-solver

The open source Solver AI for Java, Python and Kotlin to optimize scheduling and routing. Solve the vehicle routing problem, employee rostering, task assignment, maintenance scheduling and other planning problems.
https://timefold.ai
Apache License 2.0
967 stars 84 forks source link

Feat: produce stack trace from python code called from JVM #960

Open Alex-K37 opened 3 months ago

Alex-K37 commented 3 months ago

Is your feature request related to a problem? Please describe. I have a generate_problem function, which updates certain attributes of the domain objects. It seems, that this is an interaction issue of Python code executed from the JVM.

Traceback (most recent call last):
  File "/home/***/git/optapy-pplan/optapy-test/src/main2.py", line 426, in <module>
    solution = solver.solve(generate_problem(sservice.context._inputfile))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/***/.virtualenvs/timefold/lib/python3.11/site-packages/timefold/solver/_solver.py", line 109, in solve
    java_solution = self._delegate.solve(java_problem)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'ai.timefold.jpyinterpreter.types.PythonString'

In my opinion, such an error could surface at any point during development leaving the developer completely clueless where to look for the error, unless he/she is following an approach of add-line/solve/add-line/solve/.... which I consider the most tedious.

Describe the solution you'd like

Produce an error message which includes a stack trace from python. This should include information that relates to the original code somehow, even if mangled by jpype or jpyinterpreter.

Describe alternatives you've considered

1) Follow an approach of add-line/solve/add-line/solve/.... which I consider the most tedious.

2) Perform type checking after generate_problem() (does that help at all with a possibly dynamic issue?)

Additional remarks

I cannot find hints to why this PythonString type is necessary and which components have to use it. There is a translation of jpype between Java strings and Python strings. Why isn't this sufficient? Or, why isn't this PythonString automatically translated to a python string, while executing in CPython or jpy*-translated python code? I do not want to deviate too much from my original feature request, however.

triceo commented 3 months ago

Thanks for reporting, @Alex-K37. We have quite a lot of work to do regarding error reporting, this included.

Christopher-Chianelli commented 3 months ago

Essentially, the cause of this issue is https://github.com/jpype-project/jpype/issues/1047 ; in short:

  1. Python throws an exception
  2. Java throws an exception with that as the cause
  3. When it get back to Python, the cause is lost

As for how I know it is from Python; ai.timefold.jpyinterpreter.types.PythonString will never show up as a name if it was executed in Java (it would instead show up as its Python name, str, and the stack trace would be preserved; you would see something like this:

Traceback (most recent call last):
  File "DefaultSolver.java", line 200, in ai.timefold.solver.core.impl.solver.DefaultSolver.solve
Exception: Java Exception

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../test/.venv/lib64/python3.12/site-packages/timefold/solver/_solver.py", line 109, in solve
    java_solution = self._delegate.solve(java_problem)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ai.timefold.jpyinterpreter.types.errors.ai.timefold.jpyinterpreter.types.errors.ValueError: ai.timefold.jpyinterpreter.types.errors.ValueError: hi

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../test/test.py", line 232, in <module>
    solver.solve(problem)
  File ".../test/.venv/lib64/python3.12/site-packages/timefold/solver/_solver.py", line 111, in solve
    raise unwrap_python_like_object(e)
_jpyinterpreter.conversions.unwrap_python_like_object.<locals>.WrappedException: Traceback (most recent call last):
  File "DefaultSolver.java", line 200, in solve
  File "AbstractSolver.java", line 82, in runPhases
  File "DefaultConstructionHeuristicPhase.java", line 62, in solve
  File "ConstructionHeuristicDecider.java", line 107, in decideNextStep
  File "ConstructionHeuristicDecider.java", line 133, in doMove
  File "AbstractScoreDirector.java", line 253, in doAndProcessMove
  File "BavetConstraintStreamScoreDirector.java", line 49, in calculateScore
  File "BavetConstraintSession.java", line 83, in calculateScore
  File "BavetConstraintSession.java", line 92, in calculateScoreInLayer
  File "Propagator.java", line 66, in propagateEverything
  File "StaticPropagationQueue.java", line 123, in propagateInserts
  File "StaticPropagationQueue.java", line 116, in processAndClear
  File "StaticPropagationQueue.java", line 93, in propagate
  File "AbstractConditionalTupleLifecycle.java", line 16, in insert
  File "ConditionalUniTupleLifecycle.java", line 9, in test
  File "ConditionalUniTupleLifecycle.java", line 19, in test
  File ".../test.py", line -1, in a
  File ".../test.py", line 210, in a
    raise ValueError('example')
ValueError: example

).

I tried your domain in your Stack Overflow question with a valid dataset according to your typing and could not reproduce your exception. You would need to provide an actual dataset/constraints so we can reproduce it.

My best guess is your old optapy code copied the Java field into the Python object.

Alex-K37 commented 3 months ago

@triceo I made this a feature request instead of a bug for that reason. Feel free to close/postpone as you see fit.

I know that multi-language inter-operation can cause all sorts of hidden trouble, which is often not easily solvable. Under the assumption that there are users with limited programming experience intending to use the python API, it would offer a way better user experience in the face of modelling errors, of course.

As far as I am concerned: this is my first "serious" go at a constraint solver and I am using Python, because dealing with the data is much more flexible and interactive than doing this in Java.

@Christopher-Chianelli Thank you for mentioning that this is in fact a porting issue coming from OptaPy. I think you are right with respect to copying stuff. It is however unclear to me, when Java classes are created, first. I will write another comment on SO to not pollute this issue.

Alex-K37 commented 3 months ago

FYI: In my case, the error seems to get triggered from define_constraints(), as I have just now found out. I simply emptied this function, and the error vanishes.

IMHO this supports my request for a little better debug information. The first "timefold.solver" log output is generated when solving starts, but nothing is logged before the error occurs.

Christopher-Chianelli commented 3 months ago

If it from define_constraints, then it a duplicate of https://github.com/TimefoldAI/timefold-solver/issues/967.

Alex-K37 commented 3 months ago

It is similar. Maybe also related to TimefoldAI/timefold-solver#969.

In my case the PythonString error comes from

def teacher_soft(constraint_factory: ConstraintFactory):
    return constraint_factory.for_each(Exam) \
            .filter(
                lambda exam: 
                    exam.timeslot and (
                        re.search("Mueller",exam.teacher)!=None and 
                            exam.timeslot.start_time.date() in (
                                datetime.date(2024,7,8),
                                )
                        or ...
                     ) 
                ) \
            .penalize(HardSoftScore.ONE_SOFT)  \
            .as_constraint("Teacher soft issue")

re.search is being passed exam.teacher as PythonString. However: Exam.teacher : str in the domain.py

Christopher-Chianelli commented 3 months ago

It is similar. Maybe also related to TimefoldAI/timefold-solver#969.

In my case the PythonString error comes from

def teacher_soft(constraint_factory: ConstraintFactory):
    return constraint_factory.for_each(Exam) \
            .filter(
                lambda exam: 
                    exam.timeslot and (
                        re.search("Mueller",exam.teacher)!=None and 
                            exam.timeslot.start_time.date() in (
                                datetime.date(2024,7,8),
                                )
                        or ...
                     ) 
                ) \
            .penalize(HardSoftScore.ONE_SOFT)  \
            .as_constraint("Teacher soft issue")

re.search is being passed exam.teacher as PythonString. However: Exam.teacher : str in the domain.py

What Python version are you using? It working for me locally on Python 3.12. Side notes:

Christopher-Chianelli commented 3 months ago

re is actually a mix of Python + C; it calls the _sre module, which is C, and thus incur the massive overhead of a FFI call (said overhead was large enough to warrant making a bytecode translator).

Alex-K37 commented 3 months ago

The intention of this particular constraint is to avoid a particular day for this particular teacher. If it was the other way round, pinning would also be counterproductive, IMHO, because there are multiple timeslots per day which can be chosen.

The dataset is not perfect at all in the current state and we do not have much influence on the main database from which we have to import every couple of months. Sometimes multiple teachers surnames are assigned to an exam and if non-multiple, both forename and surname are included. We could pre-process the data and map to teacher ids, of course.

Using re was a quick solution. We might translate to Java eventually, anyway. As far as I know re.search applies caching in normal CPython, so that at least the regex translation is skipped on multiple evaluations. I have to admit, I have no idea whether this caching works with jpype/jpyinterpreter. From your answer I gather, that state is probably not kept and the cache lost.

What do you mean by "for_each only considers fully assigned entities, so exam.timeslot in the filter is pointless."? The constraint seems to have been effective - whether it was doing that efficiently is another issue. Or am I completely mistaken, here?

triceo commented 3 months ago

@Alex-K37 @Christopher-Chianelli Folks, please - may I ask to continue your conversation, at this point no longer related to this issue, in another space? Perhaps our Github discussions?

Christopher-Chianelli commented 3 months ago

The intention of this particular constraint is to avoid a particular day for this particular teacher. If it was the other way round, pinning would also be counterproductive, IMHO, because there are multiple timeslots per day which can be chosen.

The undesired_day_for_employee might be of interest to you: https://github.com/TimefoldAI/timefold-quickstarts/blob/daeec0c8865fde6055c8015dba3515ce15c76139/python/employee-scheduling/src/employee_scheduling/constraints.py#L102-L110

Using re was a quick solution. We might translate to Java eventually, anyway. As far as I know re.search applies caching in normal CPython, so that at least the regex translation is skipped on multiple evaluations. I have to admit, I have no idea whether this caching works with jpype/jpyinterpreter. From your answer I gather, that state is probably not kept and the cache lost.

The cache is kept (compiling and evaluating a regex are two different things).

What do you mean by "for_each only considers fully assigned entities, so exam.timeslot in the filter is pointless."? The constraint seems to have been effective - whether it was doing that efficiently is another issue. Or am I completely mistaken, here?

exam.room and exam.timeslot are the geninue PlanningVariable on Exam, so for_each(Exam) automatically applies a filter that does exam.room is not None and exam.timeslot is not None. Thus, it is impossible for exam.timeslot to ever be None in the filter (and thus it will always be "truthy").

If you have questions related to modelling (or confusion about anything I said here), feel free to either create a question on Stack Overflow or create a discussion.

If you have another issue (unrelated to this one), feel free to open a new issue.

If you are able to supply a reproducer for this issue (that is, code that we can run that reproduces the exception in the issue), please put it here.