anacostiaAI / anacostia-pipeline

Anacostia is a framework for creating machine learning operations (MLOps) pipelines
Apache License 2.0
1 stars 2 forks source link

deadlock in interaction between resource node and action node #5

Closed mdo6180 closed 7 months ago

mdo6180 commented 7 months ago

sequence of events:

  1. collection store detects two new files (thus matching trigger condition)
  2. collection store triggers data prep
  3. data prep executes and then signals collection store
  4. collection store updates state but only partially (current -> old)
    • marks the two current files as old, but since there are no new files, there is nothing considered "new"
  5. collection store signals data prep to execute
  6. data prep waits for collection store
  7. collection store goes into waiting state to wait for data prep
  8. deadlock happens because collection store can't proceed to update its state because:
    • collection store is waiting for data prep to finish executing before updating its state, but data prep cannot finish executiong because it is waiting on collection store to update its state; this happens because the resource nodes call update_state() after check_successors()

bug descriptions:

possible fixes:

  1. make resource nodes signal action nodes before the check_predecessors call
    • make sure the resource node doesn't signal the action node unless there's a change of state
  2. move update_state() function call to monitoring thread so it isn't blocked by execution of data prep
    • resource nodes

final resolution:

  1. split update_state() function into two separate functions:
    • new_to_current(): relabels 'new' files to 'current'
    • current_to_old(): relabels 'current' files to 'old'
  2. redesigned node lifecycle diagram