argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

API query error: `unsupported Unicode escape sequence (SQLSTATE 22P05)` #13711

Open yonirab opened 1 month ago

yonirab commented 1 month ago

Pre-requisites

What happened? What did you expect to happen?

After upgrading from v3.5.6 to v3.5.11, we see the following error in the Argo UI whenever trying to list Succeeded or Failed workflows:

{"code":13,"message":"ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)"}: ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)

argo-server logs show the following:

ERROR 2024-10-06T09:08:57.605638209Z [resource.labels.containerName: argo-server] time="2024-10-06T09:08:57.605Z" level=info duration="135.318µs" method=GET path=index.html size=487 status=0
INFO 2024-10-06T09:08:57.993832698Z [resource.labels.containerName: argo-server] 2024/10/06 09:08:57 Session ID: 00001
INFO 2024-10-06T09:08:57.993891852Z [resource.labels.containerName: argo-server] Query: SELECT count(*) as total FROM "argo_archived_workflows" WHERE (("clustername" = $1 AND "instanceid" = $2) AND "namespace" = $3 AND not exists (select 1 from argo_archived_workflows_labels where clustername = argo_archived_workflows.clustername and uid = argo_archived_workflows.uid and name = 'workflows.argoproj.io/controller-instanceid') AND exists (select 1 from argo_archived_workflows_labels where clustername = argo_archived_workflows.clustername and uid = argo_archived_workflows.uid and name = 'workflows.argoproj.io/phase' and value in ('Succeeded')))
INFO 2024-10-06T09:08:57.993898457Z [resource.labels.containerName: argo-server] Arguments: []interface {}{"default", "", "default"}
INFO 2024-10-06T09:08:57.993901898Z [resource.labels.containerName: argo-server] Stack:
INFO 2024-10-06T09:08:57.993904741Z [resource.labels.containerName: argo-server] fmt.(*pp).handleMethods@/usr/local/go/src/fmt/print.go:673
INFO 2024-10-06T09:08:57.993911097Z [resource.labels.containerName: argo-server] fmt.(*pp).printArg@/usr/local/go/src/fmt/print.go:756
INFO 2024-10-06T09:08:57.993913723Z [resource.labels.containerName: argo-server] fmt.(*pp).doPrint@/usr/local/go/src/fmt/print.go:1211
INFO 2024-10-06T09:08:57.993916162Z [resource.labels.containerName: argo-server] fmt.Append@/usr/local/go/src/fmt/print.go:289
INFO 2024-10-06T09:08:57.993918601Z [resource.labels.containerName: argo-server] log.(*Logger).Print.func1@/usr/local/go/src/log/log.go:261
INFO 2024-10-06T09:08:57.993921542Z [resource.labels.containerName: argo-server] log.(*Logger).output@/usr/local/go/src/log/log.go:238
INFO 2024-10-06T09:08:57.993924791Z [resource.labels.containerName: argo-server] log.(*Logger).Print@/usr/local/go/src/log/log.go:260
INFO 2024-10-06T09:08:57.993928471Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/persist/sqldb.(*workflowArchive).CountWorkflows@/go/src/github.com/argoproj/argo-workflows/persist/sqldb/workflow_archive.go:264
INFO 2024-10-06T09:08:57.993931447Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).ListWorkflows@/go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:193
INFO 2024-10-06T09:08:57.993934447Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/pkg/apiclient/workflow._WorkflowService_ListWorkflows_Handler.func1@/go/src/github.com/argoproj/argo-workflows/pkg/apiclient/workflow/workflow.pb.go:1826
INFO 2024-10-06T09:08:57.993936732Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.RatelimitUnaryServerInterceptor.func5@/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:65
INFO 2024-10-06T09:08:57.993942795Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1@/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
INFO 2024-10-06T09:08:57.993945407Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/auth.(*gatekeeper).UnaryServerInterceptor.func1@/go/src/github.com/argoproj/argo-workflows/server/auth/gatekeeper.go:98
INFO 2024-10-06T09:08:57.993948144Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1@/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
INFO 2024-10-06T09:08:57.993951345Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/util/grpc.glob..func1@/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:45
INFO 2024-10-06T09:08:57.993953807Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1@/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
INFO 2024-10-06T09:08:57.993974087Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.PanicLoggerUnaryServerInterceptor.func4@/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:26
INFO 2024-10-06T09:08:57.993976959Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1@/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
INFO 2024-10-06T09:08:57.993985441Z [resource.labels.containerName: argo-server] github.com/grpc-ecosystem/go-grpc-middleware/logging/logrus.UnaryServerInterceptor.func1@/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/logrus/server_interceptors.go:31
INFO 2024-10-06T09:08:57.993988269Z [resource.labels.containerName: argo-server] github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1@/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
INFO 2024-10-06T09:08:57.993990595Z [resource.labels.containerName: argo-server] github.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).UnaryServerInterceptor.func3@/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107
INFO 2024-10-06T09:08:57.993992913Z [resource.labels.containerName: argo-server] Error: upper: slow query
INFO 2024-10-06T09:08:57.993995607Z [resource.labels.containerName: argo-server] Time taken: 0.65064s
INFO 2024-10-06T09:08:57.993997870Z [resource.labels.containerName: argo-server] Context: context.Background
INFO 2024-10-06T09:08:57.994000109Z [resource.labels.containerName: argo-server] {}
ERROR 2024-10-06T09:08:58.003874930Z [resource.labels.containerName: argo-server] time="2024-10-06T09:08:58.003Z" level=error msg="finished unary call with code Internal" error="rpc error: code = Internal desc = ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" grpc.code=Internal grpc.method=ListWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2024-10-06T09:08:57Z" grpc.time_ms=665.575 span.kind=server system=grpc
ERROR 2024-10-06T09:08:58.003922336Z [resource.labels.containerName: argo-server] time="2024-10-06T09:08:58.003Z" level=info duration=666.353369ms method=GET path=/api/v1/workflows/default size=83 status=500

Note also the following:

argo-production=> select count(*) from argo_archived_workflows where workflow::text LIKE '%\u0000%';
 count
-------
  1459
(1 row)

argo-production=> 

Our DB for archived workflows is Postgres 9.6.

We have the following defined in workflow-controller-configmap

  persistence: |
    archive: true
    archiveTTL: 180d
    connectionPool:
      maxIdleConns: 100
      maxOpenConns: 0
      connMaxLifetime: 0s
    nodeStatusOffLoad: true
    postgresql:
      database: argo-production
      host: <Our DB host>
      passwordSecret:
        key: password
        name: argo-postgres-config
      port: 5432
      tableName: argo_workflows
      userNameSecret:
        key: username
        name: argo-postgres-config

Version(s)

v3.5.11

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

This is in our production environment immediately after updating to 3.5.11

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
Joibel commented 1 month ago

My guess is that one of your workflows has non-ascii characters in it that are upsetting sqllite. It's unclear if they're actually an invalid unicode escape sequence.

Possibly in something like the description of the workflow.

Can you look at the workflows in the cluster and work out which one(s) it might be and attach (a redacted version) of them. Ideally in a zip or tar file so as not to lose any bad escape characters via copy and pasting into github.

tooptoop4 commented 2 weeks ago

any idea @jiachengxu @linzhengen @MasonM

https://www.postgresql.org/message-id/368156.1677514339%40sss.pgh.pa.us mentions JSONB is stricter than JSON

MasonM commented 1 week ago

@tooptoop4 Good find, but the JSONB migration (https://github.com/argoproj/argo-workflows/pull/13779) wasn't in 3.5.11, so unless @yonirab has done the migration manually, that's not the cause.

My first thought was that this is a regression from https://github.com/argoproj/argo-workflows/pull/12912, since that changed the SELECT queries to use JSON operators, which will fail with unsupported Unicode escape sequence regardless of whether the data is stored as text, json, or jsonb:

postgres=# insert into argo_archived_workflows (uid, instanceid, name, namespace, phase, clustername, workflow) values ('0b3ec561-bc78-400c-9772-ce11c69b3d65', '', 'dg92nw8qr4k', 'argo', 'Succeeded', 'default', '{ "key2": "\u0000", "key1": 1}');
INSERT 0 1
postgres=# select workflow::json from argo_archived_workflows;
            workflow            
--------------------------------
 { "key2": "\u0000", "key1": 1}
(1 row)

postgres=# select (workflow::json)->'key2' from argo_archived_workflows;
ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.
CONTEXT:  JSON data, line 1: { "key2": "\u0000...

However, it appears that was released as v3.5.6.

tooptoop4 commented 1 week ago

i was able to reproduce this on master with postgres db and below workflow (click submit after creating the cron):

              apiVersion: argoproj.io/v1alpha1
              kind: CronWorkflow
              metadata:
                name: test-cron-wfnull
              spec:
                schedule: "* * * * *"
                concurrencyPolicy: "Forbid"
                startingDeadlineSeconds: 0
                workflowSpec:
                  entrypoint: whalesay
                  templates:
                  - name: whalesay
                    container:
                      image: alpine:3.6
                      command: [sh, -c]
                      env: null
                      args: ["date; echo '\x00'"]

in my case, since its a new installation, the workflow would run but fail to get archived:

Query:          INSERT INTO "argo_archived_workflows" ("clustername", "finishedat", "instanceid", "name", "namespace", "phase", "startedat", "uid", "workflow") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) RETURNING "clustername", "uid"
controller:     Arguments:      []interface {}{"default", time.Date(2024, time.November, 13, 5, 57, 14, 0, time.Local), "", "test-cron-wfnull-1731437820", "argo", "Failed", time.Date(2024, time.November, 13, 5, 57, 0, 0, time.Local), "c61e460c-47d3-46b2-aa16-b6c28c728567", "{\"kind\":\"Workflow\",\"apiVersion\":\"argoproj.io/v1alpha1\",\"metadata\":{\"name\":\"test-cron-wfnull-1731437820\",\"namespace\":\"argo\",\"uid\":\"c61e460c-47d3-46b2-aa16-b6c28c728567\",\"resourceVersion\":\"96738\",\"generation\":5,\"creationTimestamp\":\"2024-11-12T18:57:00Z\",\"labels\":{\"workflows.argoproj.io/completed\":\"true\",\"workflows.argoproj.io/cron-workflow\":\"test-cron-wfnull\",\"workflows.argoproj.io/phase\":\"Failed\",\"workflows.argoproj.io/workflow-archiving-status\":\"Persisted\"},\"annotations\":{\"workflows.argoproj.io/pod-name-format\":\"v2\",\"workflows.argoproj.io/scheduled-time\":\"2024-11-13T05:57:00+11:00\"},\"ownerReferences\":[{\"apiVersion\":\"argoproj.io/v1alpha1\",\"kind\":\"CronWorkflow\",\"name\":\"test-cron-wfnull\",\"uid\":\"3352f4c0-a2f9-4dae-8d1e-7ce2647c3cbb\",\"controller\":true,\"blockOwnerDeletion\":true}],\"managedFields\":[{\"manager\":\"workflow-controller\",\"operation\":\"Update\",\"apiVersion\":\"argoproj.io/v1alpha1\",\"time\":\"2024-11-12T18:57:14Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:workflows.argoproj.io/pod-name-format\":{},\"f:workflows.argoproj.io/scheduled-time\":{}},\"f:labels\":{\".\":{},\"f:workflows.argoproj.io/completed\":{},\"f:workflows.argoproj.io/cron-workflow\":{},\"f:workflows.argoproj.io/phase\":{},\"f:workflows.argoproj.io/workflow-archiving-status\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"3352f4c0-a2f9-4dae-8d1e-7ce2647c3cbb\\\"}\":{}}},\"f:spec\":{},\"f:status\":{}}}]},\"spec\":{\"templates\":[{\"name\":\"whalesay\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:3.6\",\"command\":[\"sh\",\"-c\"],\"args\":[\"date; echo '\\u0000'\"],\"resources\":{}}}],\"entrypoint\":\"whalesay\",\"arguments\":{},\"activeDeadlineSeconds\":300,\"podSpecPatch\":\"terminationGracePeriodSeconds: 3\\n\"},\"status\":{\"phase\":\"Failed\",\"startedAt\":\"2024-11-12T18:57:00Z\",\"finishedAt\":\"2024-11-12T18:57:14Z\",\"estimatedDuration\":109,\"progress\":\"0/1\",\"message\":\"main: Error (exit code 1)\",\"nodes\":{\"test-cron-wfnull-1731437820\":{\"id\":\"test-cron-wfnull-1731437820\",\"name\":\"test-cron-wfnull-1731437820\",\"displayName\":\"test-cron-wfnull-1731437820\",\"type\":\"Pod\",\"templateName\":\"whalesay\",\"templateScope\":\"local/test-cron-wfnull-1731437820\",\"phase\":\"Failed\",\"message\":\"main: Error (exit code 1)\",\"startedAt\":\"2024-11-12T18:57:00Z\",\"finishedAt\":\"2024-11-12T18:57:10Z\",\"progress\":\"0/1\",\"resourcesDuration\":{\"cpu\":0,\"memory\":4},\"outputs\":{\"exitCode\":\"1\"},\"hostNodeName\":\"minikube\"}},\"conditions\":[{\"type\":\"PodRunning\",\"status\":\"False\"},{\"type\":\"Completed\",\"status\":\"True\"}],\"resourcesDuration\":{\"cpu\":0,\"memory\":4},\"artifactRepositoryRef\":{\"configMap\":\"artifact-repositories\",\"key\":\"default-v1\",\"namespace\":\"argo\",\"artifactRepository\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"minio:9000\",\"bucket\":\"my-bucket\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"secretkey\"}}}},\"artifactGCStatus\":{\"notSpecified\":true},\"taskResultsCompletionStatus\":{\"test-cron-wfnull-1731437820\":true}}}"}
controller:     Stack:          
controller:         fmt.(*pp).handleMethods@/usr/local/go/src/fmt/print.go:673
controller:         fmt.(*pp).printArg@/usr/local/go/src/fmt/print.go:756
controller:         fmt.(*pp).doPrint@/usr/local/go/src/fmt/print.go:1208
controller: time="2024-11-13T05:57:14.660Z" level=debug msg="Patch pods 200"
controller:         fmt.Append@/usr/local/go/src/fmt/print.go:289
controller:         log.(*Logger).Print.func1@/usr/local/go/src/log/log.go:261
controller:         log.(*Logger).output@/usr/local/go/src/log/log.go:238
controller:         log.(*Logger).Print@/usr/local/go/src/log/log.go:260
controller:         github.com/argoproj/argo-workflows/v3/persist/sqldb.(*workflowArchive).ArchiveWorkflow.func1@/redact/argo-workflows-main/persist/sqldb/workflow_archive.go:116
controller:         github.com/argoproj/argo-workflows/v3/persist/sqldb.(*workflowArchive).ArchiveWorkflow@/redact/argo-workflows-main/persist/sqldb/workflow_archive.go:106
controller:         github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).archiveWorkflowAux@/redact/argo-workflows-main/workflow/controller/controller.go:1202
controller:         github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).archiveWorkflow@/redact/argo-workflows-main/workflow/controller/controller.go:1182
controller: time="2024-11-13T05:57:14.676Z" level=error msg="failed to archive workflow" error="failed to archive workflow: ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" key=argo/test-cron-wfnull-1731437820
controller:         github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextArchiveItem@/redact/argo-workflows-main/workflow/controller/controller.go:926
controller:         github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runArchiveWorker@/redact/argo-workflows-main/workflow/controller/controller.go:822
controller:         k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1@/redact/go/pkg/mod/k8s.io/apimachinery@v0.30.3/pkg/util/wait/backoff.go:226
controller:         k8s.io/apimachinery/pkg/util/wait.BackoffUntil@/redact/go/pkg/mod/k8s.io/apimachinery@v0.30.3/pkg/util/wait/backoff.go:227
controller:         k8s.io/apimachinery/pkg/util/wait.JitterUntil@/redact/go/pkg/mod/k8s.io/apimachinery@v0.30.3/pkg/util/wait/backoff.go:204
controller:     Error:          ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)
controller:     Time taken:     0.01704s
controller:     Context:        context.Background

as you can see the \x00 from the workflow is being converted to \u0000 in the json. probably by https://github.com/argoproj/argo-workflows/blob/217b598684c6d0cb9384e8c649f8e73659c5f9e5/persist/sqldb/workflow_archive.go#L102

so i think few things would need to happen:

  1. marshall method changed slightly/some sort of other replacement values done so \u0000 never is being inserted to the db
  2. one off update run on the db to clear out those values on existing rows
  3. when reading the value back from the db to present in the ui need to reconvert back to original \x00 character