aporeto-inc / trireme-lib

Simple, scalable and secure application segmentation
https://trireme.io
Apache License 2.0
300 stars 51 forks source link

trireme openshift origin v1.4.0 #150

Closed barnzdan closed 3 years ago

barnzdan commented 7 years ago

% oc version oc v1.4.1+3f9807a kubernetes v1.4.0+776c994 features: Basic-Auth

openshift v1.4.0-rc1+b4e0954 kubernetes v1.4.0+776c994

Trying to run trireme on openshift using DaemonSets. It looks like to solution we are desperately in need of. Using the DaemonSet PSK, Ive created the trireme secret but get the following error in my pods when they spin up:

% oc logs po/trireme-2dybv I0203 14:28:32.609910 6 main.go:27] Config used: &{KubeEnv:true AuthType:PSK KubeNodeName:node01 NodeAnnotationKey: PKIDirectory: KubeConfigLocation: TriremePSK:XXXXXXXXXX== TriremeNets:[10.0.0.0/8] ExistingContainerSync:true} W0203 14:28:32.616409 6 client_config.go:481] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0203 14:28:32.617117 6 main.go:48] Starting Trireme PSK I0203 14:28:32.644971 6 iptables.go:608] Can't clear PREROUTING iptables command I0203 14:28:32.647501 6 iptables.go:608] Can't clear PREROUTING iptables command I0203 14:28:32.648981 6 iptables.go:608] Can't clear PREROUTING iptables command I0203 14:28:32.651453 6 iptables.go:608] Can't clear POSTROUTING iptables command E0203 14:28:32.651664 6 datapath.go:224] Error unbinding existing NFQ handler from AfInet protocol family: operation not permitted

dstiliadis commented 7 years ago

What operating system are you using and does it have "raw" tables enabled? The default configuration requires raw tables (iptables -t raw -nvL ) .. There is a configuration option that we can enable to avoid using raw tables. Please let us know.

barnzdan commented 7 years ago

Im using RHEL 7. It appears to support raw tables.

barnzdan commented 7 years ago

The ansible playbooks for openshift origin mask iptables and use firewalld. Ive masked firewalld and enabled iptables. The DaemonSet is trying to deploy to the nodes over and over, where they go into Error then CrashLoopBackOff. I've cleared the iptables existing sets in an attempt to let trireme pods take over. Anything else I can do on the node to prepare it to run the trireme pods?

Chain PREROUTING (policy ACCEPT 2881 packets, 1530K bytes) pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 4098 packets, 1215K bytes) pkts bytes target prot opt in out source destination

dstiliadis commented 7 years ago

Can you please attach the logs of the of the daemonsets? We will try to test with a local installation of Openshift. Also, please send the steps that you are following.

barnzdan commented 7 years ago

Thanks for considering an Openshift use case, hope this helps.

http://pastebin.com/BqbPBStQ

We are using the README in the deployment for trireme-kubernetes: https://github.com/aporeto-inc/trireme-kubernetes/tree/master/deployment

Steps:

  1. ./createPSK.sh

% oc get secrets NAME TYPE DATA AGE builder-dockercfg-8501s kubernetes.io/dockercfg 1 10m builder-token-b45wr kubernetes.io/service-account-token 4 10m builder-token-pfqv0 kubernetes.io/service-account-token 4 10m default-dockercfg-z6m7v kubernetes.io/dockercfg 1 10m default-token-k1fwp kubernetes.io/service-account-token 4 10m default-token-rfzhp kubernetes.io/service-account-token 4 10m deployer-dockercfg-8336s kubernetes.io/dockercfg 1 10m deployer-token-pk454 kubernetes.io/service-account-token 4 10m deployer-token-scq2x kubernetes.io/service-account-token 4 10m trireme Opaque 1 10m

  1. Had to change the fieldPath to fit the openshift downward API in the DaemonSet yaml for PSK, else it doesnt spin the pods. Of note, metadata.name still refs the POD name and not the KUBENODE which Im sure is still problematic: https://docs.openshift.org/latest/dev_guide/downward_api.html

fieldPath: metadata.name

  1. oc create -f daemonSetPSK.yaml
derezed88 commented 7 years ago

As of today, I can run Trireme and the demo policy example from the aforementioned trireme-kubernetes repo. Expect a more properly formatted step-by-step document soon, but this should suffice for now.

In summary, the key changes I needed were OpenShift permissions (for privileged container creation and cluster-admin role) to get the daemonset running. system:admin login and "cluster-admin" may seem a bit heavy handed and there may be a smaller, more restrictive set of permissions that work, but what follows is what I used to get the setup running.

I'm also running the easier PSK method of node authentication. Prior to figuring out the permissions, I've also seen CrashLoops but I've not seen any iptables PRE/POSTROUTING messages in pod logs.

My environment: RHEL 7.3, and OpenShift via oc cluster up

% docker --version`
Docker version 1.12.6, build 96d83a5/1.12.6
% oc version
oc v1.4.0-alpha.1+f189ede
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.88.131:8443
openshift v1.4.0-alpha.1+f189ede
kubernetes v1.4.0+776c994

OpenShift preparatory commands

% oc login -u system:admin % oc project default % oc edit scc privileged

In the edit scc interface, add default service account to users (- system:serviceaccount:default:default). My users section looks like this:

users:
- system:serviceaccount:openshift-infra:build-controller
- system:serviceaccount:default:router
- system:serviceaccount:default:default

% oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:default:default

Trireme execution commands

In trireme-kubernetes/deployment/Trireme/KubeDaemonSet, edit daemonSetPSK.yaml for PSK and proper TRIREME_NETS subnet value. Create trireme secrets (project scoped to "demo").

% sh createPSK.sh
Attempting to generate PSK
secret "trireme" created
% oc create -f daemonSetPSK.yaml
daemonset "trireme" created
% oc get daemonset
NAME      DESIRED   CURRENT   NODE-SELECTOR   AGE
trireme   1         1         <none>          2m
% oc get pods
NAME                      READY     STATUS    RESTARTS   AGE
docker-registry-1-objnf   1/1       Running   231        59m
router-1-gbf5j            1/1       Running   227        59m
trireme-4ejgm             1/1       Running   5          3m

From this point you can continue with the PolicyExample from the repo.

Things to look out for: • daemonSet yaml: TRIREME_NETS proper value • Proper project/namespace scoping for secrets and where you run pods

Please let us know if you've been able to make progress with Trireme and OpenShift.

barnzdan commented 7 years ago

Thanks for the extra effort and the writeup! We will give this a try.

barnzdan commented 7 years ago

We've made some progress with this now. Followed the steps here:

https://github.com/aporeto-inc/trireme-kubernetes/tree/master/deployment/OpenShift

The deployment was successful. The only issue Im running into now is that it seems to be ignoring the DefaultDeny rule in the namespace?

% oc get ns/demo -o yaml apiVersion: v1 kind: Namespace metadata: annotations: net.beta.kubernetes.io/network-policy: | { "ingress": { "isolation": "DefaultDeny" } } ...

% oc describe pods --namespace=demo | grep '^Name:|IP' Name: backend IP: 10.129.2.4 Name: external IP: 10.130.0.4 Name: frontend IP: 10.131.0.2

% oc exec --namespace=demo -it frontend /bin/bash I have no name!@frontend:/data$ wget http://10.129.2.4
converted 'http://10.129.2.4' (ANSI_X3.4-1968) -> 'http://10.129.2.4' (UTF-8) --2017-03-31 17:18:56-- http://10.129.2.4/ Connecting to 10.129.2.4:80... connected. HTTP request sent, awaiting response... 200 OK

% oc exec --namespace=demo -it external /bin/bash I have no name!@external:/data$ wget http://10.129.2.4
converted 'http://10.129.2.4' (ANSI_X3.4-1968) -> 'http://10.129.2.4' (UTF-8) --2017-03-31 17:19:21-- http://10.129.2.4/ Connecting to 10.129.2.4:80... connected. HTTP request sent, awaiting response... 200 OK

bvandewalle commented 7 years ago

Hi,

Is Trireme successfully deployed on the nodes where those pods run ? Is the Trireme Network confifured to 10.0.0.0/8 ?(Part of the YAML for the DaemonSet) Can you make sure the Namespace got the annotations ? (kubectl get namespaces -o yaml) Can you copy paste the Trireme logs from the Sender/Receiver nodes ?

Thanks!

barnzdan commented 7 years ago

demo project pods: % oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE backend 1/1 Running 0 4m 10.131.0.3 node1.example.com external 1/1 Running 0 4m 10.130.0.6 node2.example.com frontend 1/1 Running 0 4m 10.129.2.6 node3.example.com

default project pods % oc get pods -o wide trireme-10jf1 1/1 Running 0 9m 10.2.103.54 node3.example.com trireme-51zvp 1/1 Running 0 9m 10.2.103.53 node1.example.com trireme-j2gg4 1/1 Running 0 9m 10.2.103.55 node2.example.com

value: 10.0.0.0/8 % oc get hostsubnets - NAME HOST HOST IP SUBNET node1.example.com node1.example.com 10.2.103.53 10.131.0.0/23 node3.example.com node3.example.com 10.2.103.54 10.129.2.0/23 node2.example.com node2.example.com 10.2.103.55 10.130.0.0/23

% oc get ns/demo -o yaml apiVersion: v1 kind: Namespace metadata: annotations: net.beta.kubernetes.io/network-policy: | { "ingress": { "isolation": "DefaultDeny" } } openshift.io/sa.scc.mcs: s0:c8,c7 openshift.io/sa.scc.supplemental-groups: 1000070000/10000 openshift.io/sa.scc.uid-range: 1000070000/10000 creationTimestamp: 2017-03-31T17:47:54Z name: demo resourceVersion: "13649" selfLink: /api/v1/namespacesdemo uid: 2b1f5fec-163a-11e7-b669-005056bd1926 spec: finalizers:

Trireme Logs: https://pastebin.com/j9YLTczh

barnzdan commented 7 years ago

What ovs option should I be using, ovs-subnet, ovs-multitenant? Disable the openshift SDN altogether?

bvandewalle commented 7 years ago

The network backend should not matter at all. You can use any of those.

Can you try to remove those test containers ? (Frontend, backend and external) and restart them ? And see if the policy applies. (And if not tail the logs of the Trireme pod for the node that contains those containers)

Thanks, Bernard

barnzdan commented 7 years ago

I recreated the demo pods, frontend, backend and external and had the same results. Please see the pastebin logs of each of the relevant log of each trireme pods for the nodes that containers the demo containers:

https://pastebin.com/3YTHAKRY

Hope this helps.

barnzdan commented 7 years ago

Any updates on this?