NASA-AMMOS / aerie

A software framework for modeling spacecraft.
https://nasa-ammos.github.io/aerie-docs/
MIT License
73 stars 19 forks source link

Refactor RecurrenceGoal conflict production and solving #1510

Closed adrienmaillard closed 3 months ago

adrienmaillard commented 4 months ago

Description

See the ticket for the original issue. The cause for this issue is that

Because of this independence between conflicts, we had to find a way to specify temporal constraints for each conflict. What is done is that if the recurrence interval is 1 hour, each conflict is being given a 1-hour period to insert an activity. But you can see how this can result in 2 contiguous activities being separated by more than 1 hour (the first is scheduled at the start of its interval and the other at the end of its interval (because of a mutex possibly)).

I tried 2 approaches:

  1. going back to an iterative conflict production that was ditched less than a year ago (I'd say) in which for each goal we call getConflicts not only before but also after solving to see if new conflicts are discovered. I combined this with a change to RecurrenceGoal so that it produces only one conflict at a time. Because conflicts would be produced one at a time, it would take into account the last activity insertion. Unfortunately, it required adding a specific block for RecurrenceGoal in the solving loop and I had to update the definition of RecurrenceGoal at each iteration so it would not stay stuck in case one of the activity insertion fails. All of this just looked like a workaround for a bigger issue.
  2. creating a new type of conflict that takes relationships between subsequent activities into account. Instead of multiple 1-activity template conflicts, we have one big MissingNetwork conflict that contains a list of activities and their temporal constraints encoded as an STN. To solve this conflict, we go through each activity in the list, build a MissingActivityTemplateConflict with the most up-to-date STN temporal constraints and give that to the regular solver for that type of conflict. Then, we update the STN bounds with the start/end times of the inserted activity, we propagate and we find the bounds for the next activity. And so on and so on.

I ended up with 2 as this is more elegant and also will be useful to instantiate task networks in #1073 #870.

I am also updating how the API of this goal to include an explicit temporal flexibility definition. Instead of just the every(Duration), I am adding separatedByNoMoreThan(Duration) and separatedByNoLessThan(Duration). The first is the one leading to conflict production. Which means that there is no penalty by having smaller separations in existing plans but... when we create new activities we use the second operator to determine the minimum space between activities.

I have refactored the satisfyGoalGeneral method of PrioritySolver to put the business logic of solving each type of conflict in its own method. It's just easier to read. And I needed to use one for this new work.

I have also refactored the satisfaction concept in the scheduler so it's not just a ill-defined number.

Verification

Documentation

Future work

None.

adrienmaillard commented 4 months ago

I faced some bugs that made me reconsider the semantics of the goal. @srschaffJPL helped me reflect on this subject, here is a summary.

Alternative (easier) way of solving this issue

When the user writes .every(1hour), we don't only allow activities to start at times 0, 1, 2, 3 but we allow a default/implicit flexibility (to account for mutex/state constraints) of 1 hour. In other words we allow the first activity to start in [0, 1[, the second in [1, 2[ and so on. Which can cause activities to be separated by more than 1 hour.

We could be explicit about this flexibility with additional operators which would have the user write something like

.every(1 hour)
.plus(30 minutes)
.minus(10 minutes)

We would not have to depart from the current approach of independent conflict and activities could still be separated by more than 1 hour but it would be the user's problem, not ours. Note that the scheduler now schedules activities at their earliest start time.

Important point: in Clipper recurrence goals are mostly concerned by enforcing that 2 activities are not separated by more than some duration. So the strictness is important which would go against this alternative way of solving the issue.

Making MissingNetwork conflicts

Now let's come back to the MissingNetwork solution. One issue I was struggling with were the difference between activity and absolute timepoint boundaries.

There is one basic use case that is recurrence(activity1, activity2) that arises when there is too much time passing between the starts of activity1 and activity2. In this case, we can put constraints between missing template activities in the STN and existing activities. In other words we can bound the space between all activities. This is the easiest case.

The other basic use case that I’ll note recurrence(timepoint1, timepoint2) is the one when the plan is empty and the timepoints correspond to the start and end of the planning horizon. In this case,

What got out of the conversation is that we should let the users tell us where the activity (it’s fake though, there is no activity) before timepoint1 is, which would say when to actually start the serie. It would look like that

.separatedByNoLessThan(1)
.separatedByNoMoreThan(3)
.lastActOccurred(1)

The default value for this parameter could be the negated value of separatedByNoMoreThan, here -3 then. This would allow to conserve the usual behavior that have activities start at timepoint1 (0 if planning horizon start).

As for the timepoint2, we would not put a specific distance constraint but more of a separation constraint between the last template activity and this absolute boundary. The reason for that is that we care about what happened before the plan but not about what happens after.

What happens when one of the template in a MissingNetwork cannot be scheduled ?

Sometimes, a constraint or the mission model will make it impossible to schedule one of the activities in the “chain”, let’s say it’s activity i. That is a problem because the start bounds of activities i+1…k depend on the actual start times of activity i (they are being refined with propagation). So it seems we have to fix the STN to continue solving if we do not want to have a degraded solution. Note that this does not happen currently because each activity template is independent from others (which contributes to create the issue we are trying to solve…).

Some ways of fixing it :

  1. transitive closures: we get the windows in which the activity has been scheduled. We act like activity has been scheduled at its earliest start time, propagate and log that it has failed for later report
  2. relax constraint for activity i and try again
  3. insert absolute timepoints and reconnect following activities in the graph

From the point of view of the user, what is the best behavior is something fails in the middle ?

adrienmaillard commented 4 months ago

Screenshot 2024-07-25 at 2 37 39 PM

dandelany commented 3 months ago

Discussed this with @adrienmaillard and @mattdailis earlier this week - looking good except it needs a rebase & potential conflict resolution.

Also @adrienmaillard when you get a chance - please write a couple sentences for the Upgrade Guide when this is released, to advise users on the impact of the change. Thanks!

adrienmaillard commented 3 months ago

I have made the changes you mentioned.

JoelCourtney commented 3 months ago

Is there any check to make sure separatedByAtLeast > separatedByAtMost?

adrienmaillard commented 3 months ago

I have just added a check in RecurrenceGoal. And I think it's separatedByAtLeast < separatedByAtMost :)

adrienmaillard commented 3 months ago

Upgrade note for @dandelany

RecurrenceGoal changes

The behavior of the goal has been changed when setting its interval parameter When setting the interval parameter to 1 hour, the old behavior would create 1-hour bins for the duration of the horizon (or the more restrictive applyWhen period) and place 1 activity in each of these bins, whenever it is possible considering global scheduling conditions.
With the new behavior, the interval parameter means a strict 1-hour distance between consecutive activities.

New parameters have been introduced to give more control over the behavior of the goal and how it is solved:

The solver will now try several combinations to solve this goal, starting with the least number of activities possible (separated by a greater distance), and increase the number of activities until either all the temporal constraints are satisfied or it reaches a dead-end.