coreos / torus

Torus Distributed Storage
https://coreos.com/blog/torus-distributed-storage-by-coreos.html
Apache License 2.0
1.78k stars 172 forks source link

Can't install Torus FlexVolume plugin in CoreOS. #265

Closed gosharplite closed 8 years ago

gosharplite commented 8 years ago

In CoreOS, /usr is a 'Read-only file system'. Below commands do not work.

mkdir -p /usr/libexec/kubernetes/kubelet-plugins/volume/exec/coreos.com~torus/
cp torusblk /usr/libexec/kubernetes/kubelet-plugins/volume/exec/coreos.com~torus/torus

Is there another way to install FlexVolume plugin in CoreOS?

gosharplite commented 8 years ago

In your AWS script, kubelet needs below parameter.

--volume-plugin-dir=/etc/kubernetes/volumeplugins

I will give it a try.

gosharplite commented 8 years ago

Version of my k8s cluster is shown below.

$ ./kubectl version
Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"5cb86ee022267586db386f62781338b0483733b3", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"5cb86ee022267586db386f62781338b0483733b3", GitTreeState:"clean"}

I can follow your example Run Torus on Kubernetes up to step 5, Create a volume.

$ ./torusctl -C 10.128.112.22:32379 list-peers
+-------------------------+--------------------------------------+---------+------+--------+---------------+--------------+
|         ADDRESS         |                 UUID                 |  SIZE   | USED | MEMBER |    UPDATED    | REB/REP DATA |
+-------------------------+--------------------------------------+---------+------+--------+---------------+--------------+
| http://10.2.0.64:40000  | 0d6d13a6-3207-11e6-9050-4e73b393e21b | 2.0 GiB | 0 B  | OK     | 4 seconds ago | 0 B/sec      |
| http://10.2.0.6:40000   | fab56883-3206-11e6-8955-76fa7f449663 | 2.0 GiB | 0 B  | OK     | now           | 0 B/sec      |
| http://10.2.0.128:40000 | fb1ff354-3206-11e6-8e18-8e70be9624db | 2.0 GiB | 0 B  | OK     | 3 seconds ago | 0 B/sec      |
+-------------------------+--------------------------------------+---------+------+--------+---------------+--------------+
Balanced: true Usage:  0.00%

$ ./torusblk -C 10.128.112.22:32379 volume create pg1 2GiB

However postgres-oneshot.yaml does not work. There is no devices available.

$ ./kubectl describe po postgres-torus-92650690-9cow4
Name:       postgres-torus-92650690-9cow4
Namespace:  tyd
Node:       10.128.112.24/10.128.112.24
Start Time: Tue, 14 Jun 2016 16:27:09 +0800
Labels:     app=postgres-torus,pod-template-hash=92650690
Status:     Pending
IP:     
Controllers:    ReplicaSet/postgres-torus-92650690
Containers:
  postgres:
    Container ID:   
    Image:      postgres
    Image ID:       
    Port:       5432/TCP
    QoS Tier:
      memory:       BestEffort
      cpu:      BestEffort
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment Variables:
      POD_IP:            (v1:status.podIP)
      POSTGRES_PASSWORD:    testtorus
      PGDATA:           /var/lib/postgresql/data/pgdata
Conditions:
  Type      Status
  Ready     False 
Volumes:
  data:
  <unknown>
  default-token-iyxs6:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-iyxs6
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  27m       27m     1   {default-scheduler }            Normal      Scheduled   Successfully assigned postgres-torus-92650690-9cow4 to 10.128.112.24
  27m       5s      125 {kubelet 10.128.112.24}         Warning     FailedMount Unable to mount volumes for pod "postgres-torus-92650690-9cow4_tyd(c9192682-3209-11e6-a0b0-5254005f9388)": attach command failed, status: Failure, reason: no devices available
  27m       5s      125 {kubelet 10.128.112.24}         Warning     FailedSync  Error syncing pod, skipping: attach command failed, status: Failure, reason: no devices available

Wish I know how to debug this situation further.

gosharplite commented 8 years ago

Here are the yaml files.

torus-k8s-oneshot.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    name: etcd-torus
  name: etcd-torus
spec:
  type: NodePort
  ports:
    - port: 2379
      name: etcd-client
      targetPort: etcd-client
      nodePort: 32379
  selector:
    name: etcd-torus
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: etcd-torus-internal
  name: etcd-torus-internal
spec:
  clusterIP: 10.1.0.100
  ports:
    - port: 2379
      name: etcd-client
      targetPort: etcd-client
  selector:
    name: etcd-torus
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    name: etcd-torus
  name: etcd-torus
spec:
  containers:
  - image: quay.io/coreos/etcd:v3.0.0-beta.0
    name: etcd-torus
    ports:
    - name: etcd-peers
      containerPort: 2380
    - name: etcd-client
      containerPort: 2379
    volumeMounts:
    - name: data
      mountPath: /var/lib/etcd
    env:
    - name: POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: ETCD_DATA_DIR
      value: /var/lib/etcd
    - name: ETCD_NAME
      value: etcd
    - name: ETCD_INITIAL_CLUSTER
      value: etcd=http://$(POD_IP):2380
    - name: ETCD_INITIAL_ADVERTISE_PEER_URLS
      value: http://$(POD_IP):2380
    - name: ETCD_ADVERTISE_CLIENT_URLS
      value: http://$(POD_IP):2379
    - name: ETCD_LISTEN_CLIENT_URLS
      value: http://0.0.0.0:2379
    - name: ETCD_LISTEN_PEER_URLS
      value: http://$(POD_IP):2380
  volumes:
    - name: data
      emptyDir: {}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: torus
  labels:
    app: torus
spec:
  template:
    metadata:
      name: torus
      labels:
        daemon: torus
    spec:
      containers:
      - name: torus
        image: quay.io/coreos/torus:latest
        ports:
        - name: peer
          containerPort: 40000
        - name: http
          containerPort: 4321
        env:
        - name: ETCD_HOST
          value: $(ETCD_TORUS_SERVICE_HOST)
        - name: STORAGE_SIZE
          value: 2GiB
        - name: LISTEN_HOST
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: AUTO_JOIN
          value: "1"
        - name: DEBUG_INIT
          value: "1"
        - name: DROP_MOUNT_BIN
          value: "0"
        volumeMounts:
        - name: data
          mountPath: /data
          readOnly: false
      volumes:
        - name: data
          hostPath:
            path: /srv/torus
      imagePullSecrets:
        - name: quay-torus

postgres-oneshot.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    name: postgres-torus
  name: postgres-torus
spec:
  type: NodePort
  ports:
    - port: 5432
      targetPort: postgres-client
      nodePort: 30432
  selector:
    app: postgres-torus
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: postgres-torus
  labels:
    app: postgres-torus
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres-torus
    spec:
      containers:
      - image: postgres
        name: postgres
        ports:
        - name: postgres-client
          containerPort: 5432
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: POSTGRES_PASSWORD
          value: testtorus
        - name: PGDATA
          value: "/var/lib/postgresql/data/pgdata"
      volumes:
        - name: data
          flexVolume:
            driver: "coreos.com/torus"
            fsType: "ext4"
            options:
              volume: "pg1"
              etcd: "10.1.0.100:2379"
gosharplite commented 8 years ago

There seems to have a secrets issue too. Unable to retrieve pull secret tyd/quay-torus for tyd/torus-k3f0h due to secrets "quay-torus" not found. The image pull may not succeed.

core@worker-3 ~ $ systemctl status -l kube-kubelet.service
● kube-kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/run/fleet/units/kube-kubelet.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Tue 2016-06-14 07:59:11 UTC; 1h 7min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
  Process: 14023 ExecStartPre=/usr/bin/wget -N -P /etc/cni/net.d http://10.128.112.21:8087/calico/10-calico.conf (code=exited, status=0/SUCCESS)
  Process: 14019 ExecStartPre=/usr/bin/mkdir -p /etc/cni/net.d (code=exited, status=0/SUCCESS)
  Process: 14015 ExecStartPre=/usr/bin/chmod +x /opt/cni/bin/calico (code=exited, status=0/SUCCESS)
  Process: 14008 ExecStartPre=/usr/bin/wget -N -P /opt/cni/bin http://10.128.112.21:8087/calico/calico-cni/v1.1.0/calico (code=exited, status=0/SUCCESS)
  Process: 14003 ExecStartPre=/usr/bin/mkdir -p /opt/cni/bin (code=exited, status=0/SUCCESS)
  Process: 13993 ExecStartPre=/usr/bin/wget -N -P /srv/kubelet http://10.128.112.21:8087/tyd/alpha-dev-160520/srv/kubelet/kubeconfig (code=exited, status=0/SUCCESS)
  Process: 13990 ExecStartPre=/usr/bin/mkdir -p /srv/kubelet (code=exited, status=0/SUCCESS)
  Process: 13985 ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet (code=exited, status=0/SUCCESS)
  Process: 13977 ExecStartPre=/usr/bin/wget -N -P /opt/bin http://10.128.112.21:8087/kubernetes/v1.2.0/kubelet (code=exited, status=0/SUCCESS)
  Process: 13974 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)
 Main PID: 14026 (kubelet)
   Memory: 37.1M
      CPU: 1min 21.927s
   CGroup: /system.slice/kube-kubelet.service
           ├─14026 /opt/bin/kubelet --address=0.0.0.0 --allow_privileged=true --port=10250 --hostname_override=10.128.112.24 --api_servers=https://10.128.112.21:443 --kubeconfig=/srv/kubelet/kubeconfig --pod_infra_container_image=10.128.112.21:5000/kubernetes/pause --logtostderr=true --cluster_dns=10.1.0.10 --cluster_domain=cluster.local --max-pods=10000 --config=/etc/kubernetes/manifests/ --network-plugin-dir=/etc/cni/net.d --network-plugin=cni --volume-plugin-dir=/etc/kubernetes/volumeplugins
           └─14094 journalctl -k -f

Jun 14 09:06:37 worker-3 kubelet[14026]: E0614 09:06:37.919525   14026 flexvolume.go:285] Failed to attach volume: data
Jun 14 09:06:37 worker-3 kubelet[14026]: E0614 09:06:37.919570   14026 kubelet.go:1780] Unable to mount volumes for pod "postgres-torus-92650690-9cow4_tyd(c9192682-3209-11e6-a0b0-5254005f9388)": attach command failed, status: Failure, reason: no devices available; skipping pod
Jun 14 09:06:37 worker-3 kubelet[14026]: E0614 09:06:37.919579   14026 pod_workers.go:138] Error syncing pod c9192682-3209-11e6-a0b0-5254005f9388, skipping: attach command failed, status: Failure, reason: no devices available
Jun 14 09:06:52 worker-3 kubelet[14026]: W0614 09:06:52.894614   14026 kubelet.go:1829] Unable to retrieve pull secret tyd/quay-torus for tyd/torus-k3f0h due to secrets "quay-torus" not found.  The image pull may not succeed.
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904119   14026 flexvolume_util.go:128] Failed to attach volume data, output: {"status":"Failure","message":"no devices available"}
Jun 14 09:06:52 worker-3 kubelet[14026]: , error: exit status 1
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904189   14026 flexvolume_util.go:86] attach command failed, status: Failure, reason: no devices available
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904196   14026 flexvolume.go:285] Failed to attach volume: data
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904236   14026 kubelet.go:1780] Unable to mount volumes for pod "postgres-torus-92650690-9cow4_tyd(c9192682-3209-11e6-a0b0-5254005f9388)": attach command failed, status: Failure, reason: no devices available; skipping pod
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904246   14026 pod_workers.go:138] Error syncing pod c9192682-3209-11e6-a0b0-5254005f9388, skipping: attach command failed, status: Failure, reason: no devices available
sgotti commented 8 years ago

attach command failed, status: Failure, reason: no devices available

Looks like you don't have the nbd kernel modules loaded (modprobe nbd)

gosharplite commented 8 years ago

You are correct, thanks!

On each worker below command is executed.

$ sudo modprobe nbd

The pod is running and I can follow the example:)

$ ./kubectl exec $TORUSPOD -- psql postgres -U postgres -c 'select * from films'
 code  |  title  | did | date_prod |  kind  |   len    
-------+---------+-----+-----------+--------+----------
 UA502 | Bananas | 105 |           | Comedy | 01:22:00
 T_601 | Yojimbo | 106 |           | Drama  | 
(2 rows)
sgotti commented 8 years ago

@gosharplite glad it worked out. If you want to autoload the nbd module you can follow https://coreos.com/os/docs/latest/other-settings.html#loading-kernel-modules

@barakmich Perhaps the doc/guides can be improved with these additional steps.

gosharplite commented 8 years ago

Thanks! I actually have a coreos/coreos-baremetal setup using ignition for automation.

barakmich commented 8 years ago

Good idea. I'll file the issue.

githubwithme commented 7 years ago

@gosharplite 我现在也在学习使用torus,我想在我搭建好的kubernetes集群上安装torus,不过我在执行第四步时遇到了问题https://github.com/coreos/torus/tree/master/contrib/kubernetes,报错如下:

[root@****~]# torusctl -C 10.99.237.89:32379 list-peers 2017/01/18 18:10:24 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.99.237.89:32379: i/o timeout"; Reconnecting to "10.99.237.89:32379" 2017/01/18 18:10:44 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.99.237.89:32379: i/o timeout"; Reconnecting to "10.99.237.89:32379"

Here are the yaml files. torus-k8s-oneshot.yaml

`apiVersion: v1 kind: Service metadata: labels: name: etcd-torus name: etcd-torus spec: type: NodePort ports:

postgres-oneshot.yaml

`apiVersion: v1 kind: Service metadata: labels: name: postgres-torus name: postgres-torus spec: type: NodePort ports:

这个是什么原因导致的?是我kubernetes配置不对吗?还是安装torus过程有哪些注意点?

githubwithme commented 7 years ago

在master节点上执行以下命令也报错 [root@** ~]# torusctl volume list couldn't connect to etcd: torus: no global metadata available at mds

gosharplite commented 7 years ago

@githubwithme 轉眼快半年沒有追 Torus,無法幫你看問題。最近趕著升級 k8s 1.5.2,因為 StatefulSets 似乎是未來在 k8s 架設 DB 的救星。CoreOS 用 ThirdPartyResource 提出的 Operator 觀念看起來也像在 k8s 上佈署 DB 的要素。我也在苦等 @barakmich 丟救生圈啊!

githubwithme commented 7 years ago

@gosharplite Torus性能如何?StatefulSets 功能和torus类似吗?

cenxinxing commented 7 years ago

@githubwithme 加个http://试下 torusctl -C http://10.99.237.89:32379 list-peers