argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.96k stars 3.19k forks source link

Archive Logs error: `no such file or directory` #13622

Open zhangconan opened 3 weeks ago

zhangconan commented 3 weeks ago

Pre-requisites

What happened? What did you expect to happen?

Using the artifacts function of argo to upload application logs to minio, it prompts "no such file or directory"。 I hope to upload logs to minio normally

Version(s)

V3.4.8

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

- name: main
  inputs: {}
  outputs: {}
  metadata: {}
  daemon: true
  steps:
    - - name: '1811663817060106241'
        template: '1811663817060106241'
        arguments: {}
- name: '1811663817060106241'
  inputs: {}
  outputs:
    artifacts:
      - name: service-log
        path: /tmp/service.log
        s3:
          key: /stream-job/mq/logs/1836605842368106496.log
        archive:
          none: {}
  metadata: {}
  container:
    name: ''
    image: >-
      registry.cn-zhangjiakou.aliyuncs.com/images
    command:
      - /bin/sh
      - '-c'
      - java org.springframework.boot.loader.JarLauncher
    resources:
      limits:
        memory: 512Mi
      requests:
        memory: 512Mi
    imagePullPolicy: IfNotPresent
- name: exit
  inputs: {}
  outputs: {}
  metadata: {}
  steps:
    - - name: exit-fail
        template: exit-fail
        arguments: {}
- name: exit-fail
  inputs: {}
  outputs: {}
  metadata: {}
  container:
    name: ''
    image: >-
      registry.cn-zhangjiakou.aliyuncs.com/images-failture
    command:
      - /bin/sh
      - '-c'
      - java org.springframework.boot.loader.JarLauncher
    resources: {}
    imagePullPolicy: IfNotPresent

Logs from the workflow controller

time="2024-09-19T03:11:28.243Z" level=info msg="capturing logs" argo=true 

  .   ____          _            __ _ _ 
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \ 
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ 
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) ) 
  '  |____| .__|_| |_|_| |_\__, | / / / / 
 =========|_|==============|___/=/_/_/_/ 
 :: Spring Boot ::               (v2.7.10) 

2024-09-19 11:11:30.245  INFO 15 --- [           main] com.hexadb.luban.container.kafka.Main    : Starting Main v3.2.1-SNAPSHOT using Java 11.0.20 on kafka-consume-1836603978960474112-1811663817060106241-647481992 with PID 15 (/application/BOOT-INF/classes started by root in /application) 
2024-09-19 11:11:30.248  INFO 15 --- [           main] com.hexadb.luban.container.kafka.Main    : No active profile set, falling back to 1 default profile: "default" 
2024-09-19 11:11:32.486  INFO 15 --- [           main] c.h.l.c.k.service.KafkaConsumerService   : KafkaConsumerService  ======= 
【LUBAN】-【LOAD】————> kernel  service API! 
2024-09-19 11:11:34.773  INFO 15 --- [           main] com.hexadb.luban.container.kafka.Main    : Started Main in 5.986 seconds (JVM running for 6.524) 
2024-09-19 11:11:35.916  INFO 15 --- [ consumer-0-C-1] o.s.k.l.KafkaMessageListenerContainer    : LUBAN-EXE-1811663817060106241: partitions assigned: [default_dw_xindaoceshi-0] 
time="2024-09-19T03:13:11.287Z" level=info msg="sub-process exited" argo=true error="<nil>" 
time="2024-09-19T03:13:11.287Z" level=info msg="/tmp/service.log -> /var/run/argo/outputs/artifacts/tmp/service.log.tgz" argo=true 
time="2024-09-19T03:13:11.287Z" level=info msg="Taring /tmp/service.log" 
Error: exit status 143 

Logs from in your workflow's wait container

time="2024-09-19T11:11:28.094Z" level=info msg="Starting Workflow Executor" version=v3.4.8 
time="2024-09-19T11:11:28.098Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5 
time="2024-09-19T11:11:28.098Z" level=info msg="Starting deadline monitor" 
time="2024-09-19T11:13:10.630Z" level=info msg="Deadline monitor stopped" 
time="2024-09-19T11:13:10.630Z" level=info msg="stopping progress monitor (context done)" error="context canceled" 
time="2024-09-19T11:13:11.146Z" level=warning msg="Non-transient error: context canceled" 
time="2024-09-19T11:13:11.146Z" level=info msg="Main container completed" error="context canceled" 
time="2024-09-19T11:13:11.146Z" level=info msg="No Script output reference in workflow. Capturing script output ignored" 
time="2024-09-19T11:13:11.146Z" level=info msg="No output parameters" 
time="2024-09-19T11:13:11.146Z" level=info msg="Saving output artifacts" 
time="2024-09-19T11:13:11.146Z" level=info msg="Staging artifact: service-log" 
time="2024-09-19T11:13:11.146Z" level=info msg="Copying /tmp/service.log from container base image layer to /tmp/argo/outputs/artifacts/service-log.tgz" 
time="2024-09-19T11:13:11.146Z" level=info msg="/var/run/argo/outputs/artifacts/tmp/service.log.tgz -> /tmp/argo/outputs/artifacts/service-log.tgz" 
time="2024-09-19T11:13:11.146Z" level=error msg="executor error: open /var/run/argo/outputs/artifacts/tmp/service.log.tgz: no such file or directory" 
time="2024-09-19T11:13:11.147Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: argo/2024/09/kafka-consume-1836603978960474112/kafka-consume-1836603978960474112-1811663817060106241-647481992/main.log" 
time="2024-09-19T11:13:11.147Z" level=info msg="Creating minio client using static credentials" endpoint="minio-headless:9000" 
time="2024-09-19T11:13:11.147Z" level=info msg="Saving file to s3" bucket=kernel endpoint="minio-headless:9000" key=argo/2024/09/kafka-consume-1836603978960474112/kafka-consume-1836603978960474112-1811663817060106241-647481992/main.log path=/tmp/argo/outputs/logs/main.log 
time="2024-09-19T11:13:11.151Z" level=info msg="Save artifact" artifactName=main-logs duration=4.93656ms error="<nil>" key=argo/2024/09/kafka-consume-1836603978960474112/kafka-consume-1836603978960474112-1811663817060106241-647481992/main.log 
time="2024-09-19T11:13:11.152Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log 
time="2024-09-19T11:13:11.152Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log" 
time="2024-09-19T11:13:11.201Z" level=info msg="Create workflowtaskresults 201" 
time="2024-09-19T11:13:11.202Z" level=info msg="Alloc=8249 TotalAlloc=16822 Sys=28541 NumGC=5 Goroutines=7" 
time="2024-09-19T11:13:11.202Z" level=fatal msg="open /var/run/argo/outputs/artifacts/tmp/service.log.tgz: no such file or directory"
zhangconan commented 3 weeks ago

I noticed that in the main container

time="2024-09-19T03:13:11.287Z" level=info msg="/tmp/service.log -> /var/run/argo/outputs/artifacts/tmp/service.log.tgz" argo=true 
time="2024-09-19T03:13:11.287Z" level=info msg="Taring /tmp/service.log" 
**"/tmp/service.log -> /var/run/argo/outputs/artifacts/tmp/service.log.tgz"   ** 

Appears at 2024-09-19T03:13:11.287Z ,but in the wait container,start

**time="2024-09-19T11:13:11.146Z" level=info msg="/var/run/argo/outputs/artifacts/tmp/service.log.tgz -> /tmp/argo/outputs/artifacts/service-log.tgz" 
time="2024-09-19T11:13:11.146Z" level=error msg="executor error: open /var/run/argo/outputs/artifacts/tmp/service.log.tgz: no such file or directory** 。

Appears at 2024-09-19T11:13:11.146Z。 I wonder if I missed something, causing the copy log in the wait container to appear before the main container?