Closed sgnn7 closed 4 years ago
@sgnn7 does the investigation need prioritizing, or does publishing an update to GCP need prioritizing? just want to be clear about the correct next steps
@izgeri It's the time allocation that needs prioritizing for both investigation and publishing. This bug is not in the active pipeline as far as I know and needs to be prioritized vs OSS suite stories.
ok - when you get a chance, can you please add as much context as you can here for how to repro / investigate the root cause of the bug? I'll get it in the ready
column of our board but it will likely be someone else who looks at it
Repro steps:
1.15.7-gke.23
(higher versions may work too but it's untested).CyberArk Conjur Open Source
application from GKE marketplace to that cluster.I'll spend some time today trying to figure out what the cause is.
This is a much deeper problem than anticipated. The following is an error that happens when building the marketplace app:
Building app/verify...
.build/app/dev \
/scripts/verify \
--deployer='gcr.io/conjur-gke-dev/cyberark/deployer:srdjan' \
--parameters='{ "name": "conjur", "namespace": "default", "tester.image": "gcr.io/conjur-gke-dev/cyberark/tester:srdjan" }'
INFO Parameters: {
"name": "apptest-ul5zmjck",
"namespace": "apptest-ul5zmjck",
"tester.image": "gcr.io/conjur-gke-dev/cyberark/tester:srdjan"
}
INFO Creates namespace "apptest-ul5zmjck"
namespace "apptest-ul5zmjck" created
INFO Initializes the deployer container which will deploy all the application components
error: SchemaError(io.k8s.api.apps.v1beta2.DaemonSetSpec): invalid object doesn't have additional properties
ERROR Failed to start deployer
Some things(1, 2) are hinting towards issues with bundler kubectl
that only can go forward n+1 versions so maybe we just need an update here. Either way, our code will start breaking on 1.16 anyways given that the pinned Conjur OSS helm chart doesn't have the fixes for v1beta1/*
APIs.
Also checked - there seems like there were no removals of APIs between k8s 1.13->1.14 and 1.14->1.15 making the deployer even more suspect.
I've reproduced a failure in deploying the Cyberark Conjur Open Source
Marketplace app, though the symptoms look a little different that what @sgnn7 described. I see a failure in the Kubernetes job that does the deployment, where it's rendering a YAML manifest:
Creating the manifests for the kubernetes resources that build the application "conjur-open-source-1"
+ [[ '' = \t\e\s\t ]]
+ extract_manifest /data
+ data=/data
+ extracted=/data/extracted
+ data_chart=/data/chart
+ mkdir -p /data/extracted
+ [[ -d /data/chart ]]
++ find /data/chart -maxdepth 1 -type f -name '*.tar.gz'
+ for chart in $(find "$data_chart" -maxdepth 1 -type f -name "*.tar.gz")
++ basename /data/chart/conjur.tar.gz
++ sed 's/.tar.gz$//'
+ chart_manifest_file=conjur
+ mkdir /data/extracted/conjur
+ tar xfC /data/chart/conjur.tar.gz /data/extracted/conjur
+ [[ '' = \t\e\s\t ]]
+ for chart in "$data_dir/extracted"/*
++ basename /data/extracted/conjur
++ sed 's/.tar.gz$//'
+ chart_manifest_file=conjur.yaml
+ helm template /data/extracted/conjur/chart --name=conjur-open-source-1 --namespace=default --values=/dev/fd/63
++ /bin/print_config.py --output=yaml
+ [[ '' != \t\e\s\t ]]
+ process_helm_hooks.py --manifest /data/manifest-expanded/conjur.yaml
Reading /data/manifest-expanded/conjur.yaml
+ ensure_k8s_apps_labels.py --manifest /data/manifest-expanded/conjur.yaml --appname conjur-open-source-1
Reading /data/manifest-expanded/conjur.yaml
+ /bin/set_ownership.py --app_name conjur-open-source-1 --app_uid 962b23b3-da6a-4ee7-b478-9d21d80b77a8 --app_api_version app.k8s.io/v1beta1 --manifests /data/manifest-expanded --dest /data/resources.yaml
Reading /data/manifest-expanded/conjur.yaml
Application 'conjur-open-source-1' owns 'Secret/conjur-open-source-1-conjur-authenticators'
Application 'conjur-open-source-1' owns 'Secret/conjur-open-source-1-conjur-data-key'
Application 'conjur-open-source-1' owns 'Secret/conjur-open-source-1-conjur-database-url'
Application 'conjur-open-source-1' owns 'Secret/conjur-open-source-1-conjur-ssl-ca-cert'
Application 'conjur-open-source-1' owns 'Secret/conjur-open-source-1-conjur-ssl-cert'
Application 'conjur-open-source-1' owns 'Service/conjur-open-source-1-conjur-oss-ingress'
Application 'conjur-open-source-1' owns 'Service/conjur-open-source-1-postgres'
Application 'conjur-open-source-1' owns 'Deployment/conjur-open-source-1-postgres'
Application 'conjur-open-source-1' owns 'Service/conjur-open-source-1-conjur-oss'
Application 'conjur-open-source-1' owns 'Deployment/conjur-open-source-1-conjur-oss'
+ /bin/setassemblyphase.py --manifest /data/resources.yaml --status Pending
Reading /data/resources.yaml
+ kubectl apply --namespace=default --filename=/data/resources.yaml
Error from server (Forbidden): unknown
+ handle_failure
I'm not sure if there's an easy way to read the contents of the rendered /data/resources.yaml. Continuing to dig.
Re-open to link with the fix PR.
Seems to work fine on 1.13 so we'll need to investigate what the problem is.
Discourse conversation
CC: @izgeri to prioritize