argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.55k stars 3.12k forks source link

Argo Workflow sometimes not working with libreoffice #10303

Open spoonysnail opened 1 year ago

spoonysnail commented 1 year ago

Pre-requisites

What happened/what you expected to happen?

I want to use libreoffice to convert docx file to pdf. It works well in k8s job but sometimes doesn't work when in argo workflow.

Content of Job Yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: libreoffice-test
  namespace: daily-job-workspace
spec:
  backoffLimit: 3
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: 'false'
    spec:
      containers:
        - command:
            - /bin/sh
            - '-c'
          args:
            - /_py_oss/test/gjl/libreoffice.sh
          image: linuxserver/libreoffice:latest
          imagePullPolicy: IfNotPresent
          name: libreoffice-worker
          volumeMounts:
            - mountPath: /_py_oss
              name: oss-pvc
      nodeSelector:
        schedule-workspace: daily-job
      restartPolicy: Never
      volumes:
        - name: oss-pvc
          persistentVolumeClaim:
            claimName: edu-data-pre

Content of libreoffice.sh

echo 'start';
/usr/bin/libreoffice --convert-to pdf --outdir /_py_oss/test/output/ /_py_oss/raw-data/PPTX/4675.pptx;
echo 'end';

The expected output of console is:

start
convert /_py_oss/raw-data/PPTX/4675.pptx -> /_py_oss/test/output/4675.pdf using filter : impress_pdf_Export
end

And it will produce a pdf file when I run it in job.

But when I run it in argo, the output of console is

start
end

No libreoffice log. And no pdf file produced.

However, it sometimes works and I don't know why.

Version

v3.4.4

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

metadata:
  name: libreoffice-test
  namespace: daily-argo-workspace
  labels:
    example: 'true'
spec:
  entrypoint: libreoffice-test
  templates:
    - name: libreoffice-test
      container:
        name: main
        image: linuxserver/libreoffice:latest
        command:
          - /bin/sh
          - '-c'
        args:
          - /_py_oss/test/gjl/libreoffice.sh
        volumeMounts:
          - mountPath: /_py_oss
            name: oss-pvc
  volumes:
    - name: oss-pvc
      persistentVolumeClaim:
        claimName: edu-data-pre
  ttlStrategy:
    secondsAfterCompletion: 1000
  podGC:
    strategy: OnPodCompletion

Logs from the workflow controller

time="2023-01-03T08:07:27.551Z" level=info msg="Processing workflow" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:27.561Z" level=info msg="Updated phase  -> Running" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:27.562Z" level=info msg="Pod node libreoffice-test initialized Pending" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:27.612Z" level=info msg="Created pod: libreoffice-test (libreoffice-test)" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:27.612Z" level=info msg="TaskSet Reconciliation" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:27.612Z" level=info msg=reconcileAgentPod namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:27.632Z" level=info msg="Workflow update successful" namespace=daily-argo-workspace phase=Running resourceVersion=298577526 workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="Processing workflow" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="Task-result reconciliation" namespace=daily-argo-workspace numObjs=0 workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="node changed" namespace=daily-argo-workspace new.message= new.phase=Succeeded new.progress=0/1 nodeID=libreoffice-test old.message= old.phase=Pending old.progress=0/1 workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="TaskSet Reconciliation" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg=reconcileAgentPod namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="Updated phase Running -> Succeeded" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="Marking workflow completed" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="Marking workflow as pending archiving" namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:37.551Z" level=info msg="Checking daemoned children of " namespace=daily-argo-workspace workflow=libreoffice-test
time="2023-01-03T08:07:37.557Z" level=info msg="cleaning up pod" action=deletePod key=daily-argo-workspace/libreoffice-test-1340600742-agent/deletePod
time="2023-01-03T08:07:37.565Z" level=info msg="Workflow update successful" namespace=daily-argo-workspace phase=Succeeded resourceVersion=298577644 workflow=libreoffice-test
time="2023-01-03T08:07:37.575Z" level=info msg="archiving workflow" namespace=daily-argo-workspace uid=587c5b9b-1d1d-4f69-afa8-e617abf5f9aa workflow=libreoffice-test
time="2023-01-03T08:07:37.610Z" level=info msg="Queueing Succeeded workflow daily-argo-workspace/libreoffice-test for delete in 16m40s due to TTL"
time="2023-01-03T08:07:42.575Z" level=info msg="cleaning up pod" action=deletePod key=daily-argo-workspace/libreoffice-test/deletePod

Logs from in your workflow's wait container


time="2023-01-03T08:01:06.972Z" level=info msg="Starting Workflow Executor" version=v3.4.4
time="2023-01-03T08:01:06.975Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-03T08:01:06.975Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=daily-argo-workspace podName=libreoffice-test template="{\"name\":\"libreoffice-test\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"main\",\"image\":\"linuxserver/libreoffice:latest\",\"command\":[\"/bin/sh\",\"-c\"],\"args\":[\"/_py_oss/test/gjl/libreoffice.sh\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"oss-pvc\",\"mountPath\":\"/_py_oss\"}]}}" version="&Version{Version:v3.4.4,BuildDate:2022-11-29T16:49:53Z,GitCommit:3b2626ff900aff2424c086a51af5929fb0b2d7e5,GitTag:v3.4.4,GitTreeState:clean,GoVersion:go1.18.8,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-03T08:01:06.975Z" level=info msg="Starting deadline monitor"
time="2023-01-03T08:01:13.980Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-03T08:01:13.980Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-03T08:01:13.980Z" level=info msg="No output parameters"
time="2023-01-03T08:01:13.980Z" level=info msg="No output artifacts"
time="2023-01-03T08:01:13.980Z" level=info msg="Deadline monitor stopped"
time="2023-01-03T08:01:13.980Z" level=info msg="Alloc=6684 TotalAlloc=12226 Sys=19410 NumGC=4 Goroutines=6"```
sarabala1979 commented 1 year ago

@spoonysnail I didn't see any error on logs. Are you see any error messages on workflow status?

spoonysnail commented 1 year ago

@spoonysnail I didn't see any error on logs. Are you see any error messages on workflow status?

No error. The workflow has ended gracefully. But libreoffice doesn't work in workflow.

amolsr commented 1 year ago

If you are using Kubeflow, then set the containerRuntime to Docker, which is set to Emissary by default. That would solve the issue. https://github.com/argoproj/argo-workflows/issues/9117

spoonysnail commented 1 year ago

We are using argo on kubernetes. I change argo version to v3.3.10 since settings of containerRuntime were removed from v3.4. And then set the containerRuntime to kubelet and k8sapi. libreoffice works well with these two executors. But the wait container throws error.

Here are the logs of the wait container.

logs with k8sapi executor

time="2023-01-17T08:44:51.754Z" level=info msg="Starting Workflow Executor" executorType=k8sapi version=v3.3.10
time="2023-01-17T08:44:51.757Z" level=info msg="Creating a k8sapi executor"
time="2023-01-17T08:44:51.757Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-17T08:44:51.757Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=daily-argo-workspace podName=libreoffice-test3 template="{\"name\":\"libreoffice-test3\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"main\",\"image\":\"linuxserver/libreoffice:latest\",\"command\":[\"/bin/sh\",\"-c\"],\"args\":[\"/_py_oss/test/gjl/libreoffice.sh\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"oss-pvc\",\"mountPath\":\"/_py_oss\"}]}}" version="&Version{Version:v3.3.10,BuildDate:2022-11-29T18:18:30Z,GitCommit:b19870d737a14b21d86f6267642a63dd14e5acd5,GitTag:v3.3.10,GitTreeState:clean,GoVersion:go1.17.13,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-17T08:44:51.758Z" level=info msg="Starting deadline monitor"
time="2023-01-17T08:49:51.758Z" level=info msg="Alloc=5446 TotalAlloc=11296 Sys=19410 NumGC=5 Goroutines=6"
time="2023-01-17T08:54:51.758Z" level=info msg="Alloc=5467 TotalAlloc=11402 Sys=19410 NumGC=7 Goroutines=6"

logs with kubelet executor

time="2023-01-17T08:38:49.883Z" level=info msg="Starting Workflow Executor" executorType=kubelet version=v3.3.10
time="2023-01-17T08:38:49.888Z" level=info msg="Creating a kubelet executor"
time="2023-01-17T08:38:49.888Z" level=info msg="Non configured envvar ARGO_KUBELET_PORT, defaulting the kubelet port to 10250"
time="2023-01-17T08:38:49.888Z" level=warning msg="Loading service account ca.crt as certificate authority to reach the kubelet api"
time="2023-01-17T08:38:49.888Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-17T08:38:49.888Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=daily-argo-workspace podName=libreoffice-test2-4chzk template="{\"name\":\"libreoffice-test2\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"main\",\"image\":\"linuxserver/libreoffice:latest\",\"command\":[\"/bin/sh\",\"-c\"],\"args\":[\"/_py_oss/test/gjl/libreoffice.sh\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"oss-pvc\",\"mountPath\":\"/_py_oss\"}]}}" version="&Version{Version:v3.3.10,BuildDate:2022-11-29T18:18:30Z,GitCommit:b19870d737a14b21d86f6267642a63dd14e5acd5,GitTag:v3.3.10,GitTreeState:clean,GoVersion:go1.17.13,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-17T08:38:49.888Z" level=info msg="Starting to wait completion of containers main..."
time="2023-01-17T08:38:49.888Z" level=info msg="Starting deadline monitor"
time="2023-01-17T08:38:56.061Z" level=info msg="Starting to wait completion of containers main..."
time="2023-01-17T08:39:02.709Z" level=info msg="Starting to wait completion of containers main..."
time="2023-01-17T08:39:10.586Z" level=info msg="Starting to wait completion of containers main..."
time="2023-01-17T08:39:21.116Z" level=info msg="Starting to wait completion of containers main..."
time="2023-01-17T08:39:26.120Z" level=error msg="executor error: failed to wait for main container to complete: timed out waiting for the condition: Get \"https://172.16.83.102:10250/pods\": x509: cannot validate certificate for 172.16.83.102 because it doesn't contain any IP SANs"
time="2023-01-17T08:39:26.120Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-17T08:39:26.120Z" level=info msg="No output parameters"
time="2023-01-17T08:39:26.120Z" level=info msg="No output artifacts"
time="2023-01-17T08:39:26.123Z" level=error msg="executor error: Get \"https://172.16.83.102:10250/pods\": x509: cannot validate certificate for 172.16.83.102 because it doesn't contain any IP SANs"
time="2023-01-17T08:39:26.123Z" level=info msg="Alloc=5787 TotalAlloc=11458 Sys=19666 NumGC=4 Goroutines=6"
time="2023-01-17T08:39:26.123Z" level=fatal msg="failed to wait for main container to complete: timed out waiting for the condition: Get \"https://172.16.83.102:10250/pods\": x509: cannot validate certificate for 172.16.83.102 because it doesn't contain any IP SANs"
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

spoonysnail commented 1 year ago

Any solutions?