Mellanox / ci-demo

Apache License 2.0
2 stars 19 forks source link

VMA/XLIO CI is broken #57

Closed igor-ivanov closed 2 years ago

igor-ivanov commented 2 years ago

Starting http://hpc-master.lab.mtl.com:8080/job/LibXLIO/308/ any jenkins job takes 20s and fails

[Pipeline] End of Pipeline
ERROR: config failed with msg: Step='Service' has both containerSelector and agentSelector configured, while it is mutual exclusive, setup global `step_allow_single_selector: false` to disable
Adding one-line test results to commit status...
Setting status of a6cb3b35a99e4e9b9a2b3a1e715849df3439e78b to FAILURE with url http://hpc-master.lab.mtl.com:8080/job/LibXLIO/308/ and message: '[FAIL]
 No test results found.'
Using context: Mellanox Lab
Finished: FAILURE

Suspicious actions: https://github.com/Mellanox/ci-demo/commit/fae5bc562f16f2a167da1548fa6d24aea17538f5

igor-ivanov commented 2 years ago

after changes in xlio: https://github.com/Mellanox-lab/libxlio/commit/c158e4c10d26c908bdad9b9e55d1874605677d72 and may be in ci-demo behaivour was changed to unexpected http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/LibXLIO/detail/LibXLIO/326/pipeline/2611/ should be http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/LibXLIO/detail/LibXLIO/324/pipeline

igor-ivanov commented 2 years ago

Descriprion about expected logic in specific example:

  1. step is configured to be launched on specific container

    - name: Compiler
    enable: ${do_compiler}
    agentSelector:
      - "{name: 'compiler', category: 'tool'}"

    see in matrix_job.yaml https://github.com/Mellanox-lab/libxlio/blob/master/.ci/matrix_job.yaml#L175-L182 but in fact this step is executed on all containers and bare metal nodes (http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/LibXLIO/detail/LibXLIO/326/pipeline) Expected processing can be seen http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/LibXLIO/detail/LibXLIO/324/pipeline/

  2. step is configured to be launched on bare metal in single variant

    - name: Test
    enable: ${do_test}
    agentSelector:
      - "{nodeLabel: 'r-aa-fatty09', variant:1}"

    https://github.com/Mellanox-lab/libxlio/blob/master/.ci/matrix_job.yaml#L268-L275 but in fact this step is executed on all containers and bare metal nodes (http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/LibXLIO/detail/LibXLIO/326/pipeline) Expected processing can be seen http://hpc-master.lab.mtl.com:8080/blue/organizations/jenkins/LibXLIO/detail/LibXLIO/324/pipeline/

vasily-v-ryabov commented 2 years ago

@mike-dubman part of issue still exists. Please read the latest comment from Igor.

mike-dubman commented 2 years ago

Need to add following to yaml file to request old behave

step_allow_single_selector: true
vasily-v-ryabov commented 2 years ago

But it will get the job back to the first issue.

igor-ivanov commented 2 years ago

The issue can be resolved using

   containerSelector:
      - "{name: 'skip-container'}"
    agentSelector:
      - "{nodeLabel: 'r-aa-fatty09'}"

or

    containerSelector:
       - "{name: 'compiler', category: 'tool'}"
    agentSelector:
      - "{nodeLabel: 'skip-agent'}"