argoproj / argo-events

Event-driven Automation Framework for Kubernetes
https://argoproj.github.io/argo-events/
Apache License 2.0
2.3k stars 726 forks source link

gateway validation failed #157

Closed etheleon closed 5 years ago

etheleon commented 5 years ago

Describe the bug calendar-gateway validation fails

To Reproduce Followed tutorials:

  1. create configmap
  2. create calendar gateway

When I ran $ kubectl describe gateways calendar-gateway

it says the validation failed.

Name:         calendar-gateway
Namespace:    argo-events
Labels:       gateway-name=calendar-gateway
              gateways.argoproj.io/gateway-controller-instanceid=argo-events
              sensors.argoproj.io/phase=Error
Annotations:  gateways.argoproj.io/phase=Error
API Version:  argoproj.io/v1alpha1
Kind:         Gateway
Metadata:
  Creation Timestamp:  2019-01-23T03:37:31Z
  Generation:          1
  Resource Version:    2485150
  Self Link:           /apis/argoproj.io/v1alpha1/namespaces/argo-events/gateways/calendar-gateway
  UID:                 370b9790-1ec0-11e9-9d23-062114e1fc54
Spec:
  Config Map:  calendar-gateway-configmap
  Deploy Spec:
    Metadata:
      Creation Timestamp:  <nil>
    Spec:
      Containers:  <nil>
    Status:
  Dispatch Protocol:
  Event Version:
  Processor Port:
  Type:               calendar
  Watchers:
    Sensors:
      Name:  calendar-sensor
Status:
  Message:     validation failed
  Phase:       Error
  Started At:  2019-01-23T03:37:31Z
Events:        <none>

even after creating the sensor it still fails validation

Expected behavior Seems like that gateway is not active,

Status:
  Message:     validation failed
  Phase:       Error
  Started At:  2019-01-23T03:37:31Z
Events:        <none>

Environment (please complete the following information):

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:        16.04
Codename:       xenial

Additional context Add any other context about the problem here.

I tried running docker logs on the container metalgearsolid/gateway-controller and here's the logs:

2019-01-23T09:24:37Z | info  | msg: operating on the gateway |  name:webhook-gateway namespace:argo-events phase:Error
2019-01-23T09:24:37Z | error | msg: gateway is in error state. please check escalated K8 event for the error |  name:webhook-gateway namespace:argo-events
VaibhavPage commented 5 years ago

Can you try running v0.7?

VaibhavPage commented 5 years ago

Closing issue for now.

etheleon commented 5 years ago

Hi @VaibhavPage sorry, was away.

Does v0.7 refer to the gateway version?

VaibhavPage commented 5 years ago

refers to controllers, gateway and sensor version. v0.7 is also latest

kclaes commented 5 years ago

Have the exact same problem. Using gateway-controller v0.7:

Status:
  Message:     validation failed
  Phase:       Error
  Started At:  2019-02-25T17:52:06Z
Events:        <none>

I have attached the yaml I'm trying to apply. Also, we need resource-requests and limits -- thought I'd ask if this is supported by the gateway and sensor controllers? calendar-gateway.yaml.txt

VaibhavPage commented 5 years ago

1) Can you post the logs of the gateway controller when the validation fails? 2) You can add requests and limits in the deploySpec. The deploySpec is PodSpec.

kclaes commented 5 years ago

These are all the errors that are shown from kubectl log

?[2m2019-02-25T12:09:17Z?[0m | info  | msg: starting gateway-controller |  controller-namespace:argo-events instance-id:argo-events version:v0.1.2+14b59f3.dirty
?[2m2019-02-25T12:09:17Z?[0m | info  | msg: watching gateway-controller config map updates |  controller-namespace:argo-events
...
ERROR: logging before flag.Parse: W0225 18:47:37.514879       1 reflector.go:341] github.com/argoproj/argo-events/controllers/gateway/config.go:62: watch of *v1.ConfigMap ended with: too old resource version: 47662759 (47662873)
?[2m2019-02-25T18:47:38Z?[0m | info  | msg: detected ConfigMap update. updating the gateway-controller config. |  controller-namespace:argo-events
?[2m2019-02-25T18:49:17Z?[0m | info  | msg: operating on the gateway |  name:calendar-gateway namespace:argo-events phase:Error
?[2m2019-02-25T18:49:17Z?[0m | error | msg: gateway is in error state. please check escalated K8 event for the error |  name:calendar-gateway namespace:argo-events
VaibhavPage commented 5 years ago

it looks like the gateway resource is already in error state. The controller picked up an old gateway that was in error state. You need to delete the old gateway resource first.

kclaes commented 5 years ago

Ok -- that got me a little further. It says: no associated watchers with gateway. But I'm kind of wondering how this is supposed to work -- a watcher would reference a sensor, right? But the sensor points to the gateway as well through its dependencies?

Is it correct that there is a circular dependency between the two and if so, what is the correct order to initialize them?

VaibhavPage commented 5 years ago

The Watchers is a list of sensor you want the gateway to dispatch the event to.

Refer this calendar gateway example - https://raw.githubusercontent.com/argoproj/argo-events/master/examples/gateways/calendar.yaml

Gateway listens to event sources(like github, S3, kafka stream etc) that emit events. The configuration for these event sources are stored in configmap. This is called gateway-configmap. The gateway monitors the gateway configmap for new event sources or ones that are deleted at runtime. Once gateway gets an event from event source, it forwards that event to sensor. The sensor performs job of resolving event dependencies.

VaibhavPage commented 5 years ago

So you'll need to create a configmap like https://github.com/argoproj/argo-events/blob/master/examples/gateways/calendar-gateway-configmap.yaml

The gateway requires this confimap to parse event source configuration. Otherwise gateway would sit idle.

VaibhavPage commented 5 years ago

https://github.com/argoproj/argo-events/blob/master/docs/gateway-guide.md

VaibhavPage commented 5 years ago

Order of setup doesn't really matter. but you can follow this order 1) Create sensor 2) Create gateway configmap 3) Create gateway - make sure to reference the configmap above in gateway spec.

kclaes commented 5 years ago

So what is the role of the dependencies field on the sensor? From the examples, it seems this has to refer to the name of the gateway AND the name of the event source from the configmap?

Unless I'm misunderstanding the effect of the dependencies field on the sensor?

kclaes commented 5 years ago

Maybe I should explain my use case. I have 5 different argo workflows (all of which are rather similar, expect for a parameter passed in the beginning), and I want to schedule them all on a different schedule.

I thought I could make one gateway, one configmap containing 5 differerent keys with a schedule, and then create a sensor that would pass the correct parameter to the workflow, based on the fact that the events would be different. Would this work or do I need 5 of each gateway/configmap/sensor instead?

VaibhavPage commented 5 years ago

The sensor is waiting to receive events from one or more gateways. Each gateway can run multiple event sources. So the dependencies in Sensor are list of "gateway-name:event-source-name". Once sensor receives events from gateways with correct event source names, it marks the dependencies as resolved and triggers the workflow/s.

Think of sensor as barrier waiting to receive events from one or more gateways.

e.g. Consider you have two gateways, one for S3 and other for calendar schedules. The S3 gateway listens to bucket notifications from bucket called foo. The calendar gateway run schedule of interval: 10seconds. Lets consider you now have a sensor waiting for events to happen at both gateways. So the sensor will define the eventDependencies as

s3-gateway:foo
calendar-gateway:interval

Once both gateways send the event to sensor, sensor will mark the dependencies as resolved and trigger workflow

VaibhavPage commented 5 years ago

In your use case, you will just need to create one gateway, one gateway configmap containing five schedule configurations, one sensor.

kclaes commented 5 years ago

Ok, great! I guess this is a good starting point to figure out how to make the sensor do different things depending on the name of the event? https://github.com/argoproj/argo-events/blob/master/examples/sensors/webhook-http-dependency-groups.yaml Or is there something else available?

VaibhavPage commented 5 years ago

You found the right example.

kclaes commented 5 years ago

There's a lot going on there. I think I got a grasp of what's expected, but alas, the trigger does not fire.

Everything gets accepted and appears to be running, but in the logs, when the event for a workflow triggers, I see this in the sensor logs:

?[2m2019-02-26T15:06:16Z?[0m | info  | msg: received an event from gateway |  sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: message successfully sent over internal queue |  event-source-name:calendar-gateway:workflow_1 sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: received event notification |  event-dependency-name:calendar-gateway:workflow_1 sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: completed |  node-name:calendar-gateway:workflow_1 sensor-name:calendar-sensor type:EventDependency
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: triggers can't be executed because event dependencies are not complete |  sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: sensor state updated successfully |  phase:Active sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: successfully persisted sensor resource update and created K8s event |  sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: sensor resource update |  sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: sensor state updated successfully |  phase:Active sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info  | msg: successfully persisted sensor resource update and created K8s event |  sensor-name:calendar-sensor

Of course, triggers can't be executed because event dependencies are not complete sticks out, but it doesn't show me why the dependencies aren't complete. I only ever list a single one.

Any pointers as to where the flaw in my logic lies?

VaibhavPage commented 5 years ago

Your setup is correct. its because of the v0.7 version. The boolean operation support and when on workflows was added in patch versions after v0.7. You can try v0.7.3 for all the images (includes sensor controller and sensor image). Or wait for v0.8 to get released. Most probably v0.8 will be released today

simpler example https://github.com/argoproj/argo-events/blob/fix-resource-gateway/examples/sensors/webhook-http-boolean.yaml

kclaes commented 5 years ago

Yeah that example looks pretty much exactly like what I've done 😄 It was the images -- I started from the examples but the sensor and gateway definitions don't have tags -- meaning it took the latest from docker hub. But it seems like latest hasn't been updated in a while.

Now that I have 0.7.4 of everything, it triggers, but it seems the sensor has some problems now:

?[2m2019-02-26T15:48:11Z?[0m | error | msg: trigger failed to execute |  ?[31merror=?[0m?[31m"v1alpha1.Workflow.ObjectMeta: readObjectStart: expect { or n, but found \", error found in #10 byte of ...|etadata\":\"generateNa|..., bigger context ...|goproj.io/v1alpha1\",\"kind\":\"Workflow\",\"metadata\":\"generateName:workflow_1-\",\"spec\":{\"|..."?[0m sensor-name:calendar-sensor trigger-name:workflow_1-trigger

I haven't changed anything about the actual trigger definition, and it's defined inline:

triggers:
    - name: workflow_1-trigger
      when:
        all:
          - workflow_1
      resource:
        namespace: argo-events
        group: argoproj.io
        version: v1alpha1
        kind: Workflow
        source:
          inline: |
            apiVersion: argoproj.io/v1alpha1
            kind: Workflow
            metadata:
              generateName: workflow_1-
            spec:
              entrypoint: dag
            ...

Seems like it's getting some escaped version of the yaml and borks on that?

VaibhavPage commented 5 years ago

yeah, looks like a formatting issue for inline workflow

VaibhavPage commented 5 years ago

If the formatting gives you headaches, you can actually refer trigger workflows using different methods as described here https://github.com/argoproj/argo-events/blob/master/docs/trigger-guide.md#how-to-define-a-trigger. Or, you can copy and paste this working example if you want and change the image tag and the event dependencies names.

kclaes commented 5 years ago

Thanks, I'll see if I can figure out where it went wrong. I like the inline definition -- keeps everyting together and self-contained without external dependencies.

kclaes commented 5 years ago

Ok, seems like this is something that's specific to v0.7.4 -- if I go back to the latest image (which is v0.7, I think?), I don't get that error and the workflow gets started (this is without the dependency groups obviously, only a single workflow per gateway/sensor).

VaibhavPage commented 5 years ago

you mean v0.7.3? V0.7.4 is specific to a bug fix in resource gateway.

The example I shared with you does have v0.7.4 as image tag, but you should still use v0.7.3

kclaes commented 5 years ago

I just went to docker hub to find the latest versions of everything. I'll confirm with every version in between...

kclaes commented 5 years ago

Ok, v0.7, v0.7.1 and v0.7.2 of sensor happily accept and start the inline workflow. v0.7.3 and v0.7.4 do not.

VaibhavPage commented 5 years ago

This is weird, cause I am using v0.7.4 as sensor image and it triggers inline workflows just fine.

Will you be able to wait for a couple of hours. I'll drop a tag for v0.8. That will resolve any confusion

kclaes commented 5 years ago

Sure thing, Thanks for the speedy followups!

kclaes commented 5 years ago

Noticed v0.8 was out so had a go -- also figured out the error I was getting with the inline was due to a missing space between a key and a value 😭

Now --- not all is wel, because I'm getting the following error and it's repeating over and over again, making the sensor create new workflows over and over again:

?[2m2019-02-26T17:48:18Z?[0m | info  | msg: sensor state updated successfully |  phase:Active sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | info  | msg: successfully persisted sensor resource update and created K8s event |  sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | info  | msg: sensor resource update |  sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | warn  | msg: error updating sensor |  ?[31merror=?[0m?[31m"Operation cannot be fulfilled on sensors.argoproj.io \"calendar-sensor\": the object has been modified; please apply your changes to the latest version and try again"?[0m sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | error | msg: failed to persist sensor update, escalating... |  ?[31merror=?[0m?[31m"Operation cannot be fulfilled on sensors.argoproj.io \"calendar-sensor\": the object has been modified; please apply your changes to the latest version and try again"?[0m sensor-name:calendar-sensor

Luckily, we have kubectl del --all workflow! :)

Any idea?

VaibhavPage commented 5 years ago

Did you update the sensor controller and sensor images with v0.8?

Can you delete existing calendar gateway and try again?

VaibhavPage commented 5 years ago

also the latest is still v0.7 and will be updated to v0.8 soon.

kclaes commented 5 years ago

I do this by default -- I pack everything in a helm chart and delete --purge my release each time - also have the version as a value in the helm chart so I'm using v0.8 explicitly i/o latest.

I restarted from fresh, so I had no sensor-controller/gateway-controllers or sensor/gateway-client/calendar-gateway pods running anymore.

VaibhavPage commented 5 years ago

let me try to reproduce it

kclaes commented 5 years ago

log.txt ^ here's the complete log. Each time that error occurs, the event appears to be reprocessed and a new workflow is created....

edit: so I tried without the dependencyGroup feature, in order to be able to try a working v0.7 use case and to see if it keeps working on v0.8.

Here's the logs when everything is at v0.7, which works: log_v0_7.txt

Here's the logs with everything at v0.8 (only difference is container tags) log_v0_8.txt

VaibhavPage commented 5 years ago

there was a bug with calendar gateway, can you test v0.8 again?

kclaes commented 5 years ago

Yes! It all works now! ❤️

VaibhavPage commented 5 years ago

Great :+1: . Can we close the issue?

kclaes commented 5 years ago

Yes, please. In the end, it had very little to do with 'gateway validation failed' anymore... I'll open a new issue if I happen to come across something.

Many thanks for your great help!