Closed etheleon closed 5 years ago
Can you try running v0.7?
Closing issue for now.
Hi @VaibhavPage sorry, was away.
Does v0.7 refer to the gateway version?
refers to controllers, gateway and sensor version. v0.7 is also latest
Have the exact same problem. Using gateway-controller v0.7:
Status:
Message: validation failed
Phase: Error
Started At: 2019-02-25T17:52:06Z
Events: <none>
I have attached the yaml I'm trying to apply. Also, we need resource-requests and limits -- thought I'd ask if this is supported by the gateway and sensor controllers? calendar-gateway.yaml.txt
1) Can you post the logs of the gateway controller when the validation fails?
2) You can add requests and limits in the deploySpec
. The deploySpec
is PodSpec.
These are all the errors that are shown from kubectl log
?[2m2019-02-25T12:09:17Z?[0m | info | msg: starting gateway-controller | controller-namespace:argo-events instance-id:argo-events version:v0.1.2+14b59f3.dirty
?[2m2019-02-25T12:09:17Z?[0m | info | msg: watching gateway-controller config map updates | controller-namespace:argo-events
...
ERROR: logging before flag.Parse: W0225 18:47:37.514879 1 reflector.go:341] github.com/argoproj/argo-events/controllers/gateway/config.go:62: watch of *v1.ConfigMap ended with: too old resource version: 47662759 (47662873)
?[2m2019-02-25T18:47:38Z?[0m | info | msg: detected ConfigMap update. updating the gateway-controller config. | controller-namespace:argo-events
?[2m2019-02-25T18:49:17Z?[0m | info | msg: operating on the gateway | name:calendar-gateway namespace:argo-events phase:Error
?[2m2019-02-25T18:49:17Z?[0m | error | msg: gateway is in error state. please check escalated K8 event for the error | name:calendar-gateway namespace:argo-events
it looks like the gateway resource is already in error state. The controller picked up an old gateway that was in error state. You need to delete the old gateway resource first.
Ok -- that got me a little further. It says: no associated watchers with gateway
.
But I'm kind of wondering how this is supposed to work -- a watcher would reference a sensor, right? But the sensor points to the gateway as well through its dependencies?
Is it correct that there is a circular dependency between the two and if so, what is the correct order to initialize them?
The Watchers is a list of sensor you want the gateway to dispatch the event to.
Refer this calendar gateway example - https://raw.githubusercontent.com/argoproj/argo-events/master/examples/gateways/calendar.yaml
Gateway listens to event sources(like github, S3, kafka stream etc) that emit events. The configuration for these event sources are stored in configmap. This is called gateway-configmap
. The gateway monitors the gateway configmap for new event sources or ones that are deleted at runtime. Once gateway gets an event from event source, it forwards that event to sensor. The sensor performs job of resolving event dependencies.
So you'll need to create a configmap like https://github.com/argoproj/argo-events/blob/master/examples/gateways/calendar-gateway-configmap.yaml
The gateway requires this confimap to parse event source configuration. Otherwise gateway would sit idle.
Order of setup doesn't really matter. but you can follow this order 1) Create sensor 2) Create gateway configmap 3) Create gateway - make sure to reference the configmap above in gateway spec.
So what is the role of the dependencies
field on the sensor?
From the examples, it seems this has to refer to the name of the gateway AND the name of the event source from the configmap?
calendar-event
(with, let's say, an interval)gateway1
and mentions sensor sensor1
in the list of its watchers.sensor1
and has gateway1:calendar-event
in its dependencies, creating a loop.Unless I'm misunderstanding the effect of the dependencies
field on the sensor?
Maybe I should explain my use case. I have 5 different argo workflows (all of which are rather similar, expect for a parameter passed in the beginning), and I want to schedule them all on a different schedule.
I thought I could make one gateway, one configmap containing 5 differerent keys with a schedule, and then create a sensor that would pass the correct parameter to the workflow, based on the fact that the events would be different. Would this work or do I need 5 of each gateway/configmap/sensor instead?
The sensor is waiting to receive events from one or more gateways. Each gateway can run multiple event sources. So the dependencies
in Sensor are list of "gateway-name:event-source-name". Once sensor receives events from gateways with correct event source names, it marks the dependencies as resolved and triggers the workflow/s.
Think of sensor as barrier waiting to receive events from one or more gateways.
e.g. Consider you have two gateways, one for S3 and other for calendar schedules. The S3 gateway listens to bucket notifications from bucket called foo
. The calendar gateway run schedule of interval: 10seconds
. Lets consider you now have a sensor waiting for events to happen at both gateways. So the sensor will define the eventDependencies as
s3-gateway:foo
calendar-gateway:interval
Once both gateways send the event to sensor, sensor will mark the dependencies as resolved and trigger workflow
In your use case, you will just need to create one gateway, one gateway configmap containing five schedule configurations, one sensor.
Ok, great! I guess this is a good starting point to figure out how to make the sensor do different things depending on the name of the event? https://github.com/argoproj/argo-events/blob/master/examples/sensors/webhook-http-dependency-groups.yaml Or is there something else available?
You found the right example.
There's a lot going on there. I think I got a grasp of what's expected, but alas, the trigger does not fire.
I've added an entry for each workflow in the gateway config map:
data:
workflow_1: |-
interval: 5m
workflow_2: |-
schedule: 5 * * * *
...
I added a dependency for each workflow in the sensor:
dependencies:
- name: "calendar-gateway:workflow_1"
- name: "calendar-gateway:workflow_2"
...
I added a dependency group for each one, containing exactly one dependency:
dependencyGroups:
- name: workflow_1
dependencies:
- "calendar-gateway:workflow_1"
...
I added a circuit to the workflow, using the ||-operator, as I interpret the circuit to be the condition for any trigger to be triggered in this sensor....
circuit: workflow_1 || workflow_2 || ....
I added a trigger for each workflow, containing a when, containing the dependencygroup matching its workflow:
triggers:
- name: workflow_1-trigger
when:
all:
- workflow_1
Everything gets accepted and appears to be running, but in the logs, when the event for a workflow triggers, I see this in the sensor logs:
?[2m2019-02-26T15:06:16Z?[0m | info | msg: received an event from gateway | sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: message successfully sent over internal queue | event-source-name:calendar-gateway:workflow_1 sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: received event notification | event-dependency-name:calendar-gateway:workflow_1 sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: completed | node-name:calendar-gateway:workflow_1 sensor-name:calendar-sensor type:EventDependency
?[2m2019-02-26T15:06:16Z?[0m | info | msg: triggers can't be executed because event dependencies are not complete | sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: sensor state updated successfully | phase:Active sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: successfully persisted sensor resource update and created K8s event | sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: sensor resource update | sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: sensor state updated successfully | phase:Active sensor-name:calendar-sensor
?[2m2019-02-26T15:06:16Z?[0m | info | msg: successfully persisted sensor resource update and created K8s event | sensor-name:calendar-sensor
Of course, triggers can't be executed because event dependencies are not complete
sticks out, but it doesn't show me why the dependencies aren't complete. I only ever list a single one.
Any pointers as to where the flaw in my logic lies?
Your setup is correct. its because of the v0.7 version. The boolean operation support and when
on workflows was added in patch versions after v0.7. You can try v0.7.3 for all the images (includes sensor controller and sensor image). Or wait for v0.8 to get released. Most probably v0.8 will be released today
simpler example https://github.com/argoproj/argo-events/blob/fix-resource-gateway/examples/sensors/webhook-http-boolean.yaml
Yeah that example looks pretty much exactly like what I've done 😄
It was the images -- I started from the examples but the sensor and gateway definitions don't have tags -- meaning it took the latest
from docker hub.
But it seems like latest
hasn't been updated in a while.
Now that I have 0.7.4 of everything, it triggers, but it seems the sensor has some problems now:
?[2m2019-02-26T15:48:11Z?[0m | error | msg: trigger failed to execute | ?[31merror=?[0m?[31m"v1alpha1.Workflow.ObjectMeta: readObjectStart: expect { or n, but found \", error found in #10 byte of ...|etadata\":\"generateNa|..., bigger context ...|goproj.io/v1alpha1\",\"kind\":\"Workflow\",\"metadata\":\"generateName:workflow_1-\",\"spec\":{\"|..."?[0m sensor-name:calendar-sensor trigger-name:workflow_1-trigger
I haven't changed anything about the actual trigger definition, and it's defined inline:
triggers:
- name: workflow_1-trigger
when:
all:
- workflow_1
resource:
namespace: argo-events
group: argoproj.io
version: v1alpha1
kind: Workflow
source:
inline: |
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: workflow_1-
spec:
entrypoint: dag
...
Seems like it's getting some escaped version of the yaml and borks on that?
yeah, looks like a formatting issue for inline workflow
If the formatting gives you headaches, you can actually refer trigger workflows using different methods as described here https://github.com/argoproj/argo-events/blob/master/docs/trigger-guide.md#how-to-define-a-trigger. Or, you can copy and paste this working example if you want and change the image tag and the event dependencies names.
Thanks, I'll see if I can figure out where it went wrong. I like the inline definition -- keeps everyting together and self-contained without external dependencies.
Ok, seems like this is something that's specific to v0.7.4 -- if I go back to the latest
image (which is v0.7, I think?), I don't get that error and the workflow gets started (this is without the dependency groups obviously, only a single workflow per gateway/sensor).
you mean v0.7.3? V0.7.4 is specific to a bug fix in resource gateway.
The example I shared with you does have v0.7.4 as image tag, but you should still use v0.7.3
I just went to docker hub to find the latest versions of everything. I'll confirm with every version in between...
Ok, v0.7, v0.7.1 and v0.7.2 of sensor
happily accept and start the inline workflow. v0.7.3 and v0.7.4 do not.
This is weird, cause I am using v0.7.4 as sensor image and it triggers inline workflows just fine.
Will you be able to wait for a couple of hours. I'll drop a tag for v0.8. That will resolve any confusion
Sure thing, Thanks for the speedy followups!
Noticed v0.8 was out so had a go -- also figured out the error I was getting with the inline was due to a missing space between a key and a value 😭
Now --- not all is wel, because I'm getting the following error and it's repeating over and over again, making the sensor create new workflows over and over again:
?[2m2019-02-26T17:48:18Z?[0m | info | msg: sensor state updated successfully | phase:Active sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | info | msg: successfully persisted sensor resource update and created K8s event | sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | info | msg: sensor resource update | sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | warn | msg: error updating sensor | ?[31merror=?[0m?[31m"Operation cannot be fulfilled on sensors.argoproj.io \"calendar-sensor\": the object has been modified; please apply your changes to the latest version and try again"?[0m sensor-name:calendar-sensor
?[2m2019-02-26T17:48:18Z?[0m | error | msg: failed to persist sensor update, escalating... | ?[31merror=?[0m?[31m"Operation cannot be fulfilled on sensors.argoproj.io \"calendar-sensor\": the object has been modified; please apply your changes to the latest version and try again"?[0m sensor-name:calendar-sensor
Luckily, we have kubectl del --all workflow
! :)
Any idea?
Did you update the sensor controller and sensor images with v0.8?
Can you delete existing calendar gateway and try again?
also the latest is still v0.7 and will be updated to v0.8 soon.
I do this by default -- I pack everything in a helm chart and delete --purge
my release each time - also have the version as a value in the helm chart so I'm using v0.8 explicitly i/o latest.
I restarted from fresh, so I had no sensor-controller/gateway-controllers or sensor/gateway-client/calendar-gateway pods running anymore.
let me try to reproduce it
log.txt ^ here's the complete log. Each time that error occurs, the event appears to be reprocessed and a new workflow is created....
edit: so I tried without the dependencyGroup feature, in order to be able to try a working v0.7 use case and to see if it keeps working on v0.8.
Here's the logs when everything is at v0.7, which works: log_v0_7.txt
Here's the logs with everything at v0.8 (only difference is container tags) log_v0_8.txt
there was a bug with calendar gateway, can you test v0.8 again?
Yes! It all works now! ❤️
Great :+1: . Can we close the issue?
Yes, please. In the end, it had very little to do with 'gateway validation failed' anymore... I'll open a new issue if I happen to come across something.
Many thanks for your great help!
Describe the bug calendar-gateway validation fails
To Reproduce Followed tutorials:
When I ran
$ kubectl describe gateways calendar-gateway
it says the validation failed.
even after creating the sensor it still fails validation
Expected behavior Seems like that gateway is not active,
Environment (please complete the following information):
Additional context Add any other context about the problem here.
I tried running docker logs on the container
metalgearsolid/gateway-controller
and here's the logs: