Hyperfoil / qDup

Lab automation and queuing scripting
Apache License 2.0
12 stars 12 forks source link

nanny task ending run prematurely #202

Open johnaohara opened 1 year ago

johnaohara commented 1 year ago

the nannytask will terminate a run if it finds scripts that are not suspended at a wait-for or a sh command that has not returned.

if the command is something else, e.g. sleep, the trun is terminated prematurely.

e.g. for a script that does;

...
script:
...
  capture-stats: 
    ...
            then:
            - set-state: RUN.${{SCOMBO}}.warm ${{=[...${{RUN.${{SCOMBO}}.warm}} , ${{WARM}}]}}
            - sleep: 2000
          - signal: ${{DCOMBO}}-DONE
...
    start-docker: 
    - for-each: APP ${{QUICKSTART_APPS}}
      then:
      - for-each: VARIANT ${{VARIANTS}}
        then:
        - set-state: VALUES ''
        - set-state: DCOMBO ${{APP.name}}-${{VARIANT.name}}-${{CPU.cores}}
        - for-each: CPU ${{CPUS}}
          then:
          - for-each: ITERATION ${{=[...Array(${{APP_STARTS}}).keys()]}}
            then:
            ...
            - wait-for: ${{DCOMBO}}-DONE

qDup will terminate the run if the only non wait-for script running is executing sleep

05:24:12.033 ending phase with 3 active idle waiting scripts
[
  {
    "uid": 1045,
    "lastUpdate": 1687411450329,
    "name": "wait-for: config-quickstart-native-1-WRK-START",
    "host": "jenkins@example.com:22",
    "contextId": "run-wrk:1030@jenkins@example.com:22",
    "startTime": 1687411450329,
    "runTime": 1703,
    "idleTime": 1703,
    "script": "1030:script-cmd: run-wrk"
  },
  {
    "uid": 1124,
    "lastUpdate": 1687411450364,
    "name": "sleep: 2000",
    "host": "jenkins@2.example.com:22",
    "contextId": "capture-stats:1032@jenkins@2.example.com:22",
    "startTime": 1687411450364,
    "runTime": 1668,
    "idleTime": 1668,
    "script": "1032:script-cmd: capture-stats"
  },
  {
    "uid": 1111,
    "lastUpdate": 1687411405662,
    "name": "wait-for: config-quickstart-JVM-16-DONE",
    "host": "jenkins@2.example.com:22",
    "contextId": "start-docker:1034@jenkins@2.example.com:22",
    "startTime": 1687411405662,
    "runTime": 46370,
    "idleTime": 46370,
    "script": "1034:script-cmd: start-docker"
  }
]
johnaohara commented 1 year ago

Digging some more into this, the capture-stats script sleep is actually in a reapeat-until loop;

          - repeat-until: ${{DCOMBO}}-WRK-DONE
            then:
            - sh: sudo pmap -x ${{PID}} | grep total | awk '{print $4}' | sed 's/[^0-9]*//g'
              then:
              - set-state: WARM
            - set-state: RUN.${{SCOMBO}}.warm ${{=[...${{RUN.${{SCOMBO}}.warm}} , ${{WARM}}]}}
            - sleep: 2000
          - signal: ${{DCOMBO}}-DONE

the nannytask should walk the Cmd stack and find the repeat-until command, realise that the script is waiting for a signal (which may have already been fired) and not stop the run