Closed gosharplite closed 8 years ago
In your AWS script, kubelet needs below parameter.
--volume-plugin-dir=/etc/kubernetes/volumeplugins
I will give it a try.
Version of my k8s cluster is shown below.
$ ./kubectl version
Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"5cb86ee022267586db386f62781338b0483733b3", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"5cb86ee022267586db386f62781338b0483733b3", GitTreeState:"clean"}
I can follow your example Run Torus on Kubernetes up to step 5, Create a volume.
$ ./torusctl -C 10.128.112.22:32379 list-peers
+-------------------------+--------------------------------------+---------+------+--------+---------------+--------------+
| ADDRESS | UUID | SIZE | USED | MEMBER | UPDATED | REB/REP DATA |
+-------------------------+--------------------------------------+---------+------+--------+---------------+--------------+
| http://10.2.0.64:40000 | 0d6d13a6-3207-11e6-9050-4e73b393e21b | 2.0 GiB | 0 B | OK | 4 seconds ago | 0 B/sec |
| http://10.2.0.6:40000 | fab56883-3206-11e6-8955-76fa7f449663 | 2.0 GiB | 0 B | OK | now | 0 B/sec |
| http://10.2.0.128:40000 | fb1ff354-3206-11e6-8e18-8e70be9624db | 2.0 GiB | 0 B | OK | 3 seconds ago | 0 B/sec |
+-------------------------+--------------------------------------+---------+------+--------+---------------+--------------+
Balanced: true Usage: 0.00%
$ ./torusblk -C 10.128.112.22:32379 volume create pg1 2GiB
However postgres-oneshot.yaml
does not work. There is no devices available.
$ ./kubectl describe po postgres-torus-92650690-9cow4
Name: postgres-torus-92650690-9cow4
Namespace: tyd
Node: 10.128.112.24/10.128.112.24
Start Time: Tue, 14 Jun 2016 16:27:09 +0800
Labels: app=postgres-torus,pod-template-hash=92650690
Status: Pending
IP:
Controllers: ReplicaSet/postgres-torus-92650690
Containers:
postgres:
Container ID:
Image: postgres
Image ID:
Port: 5432/TCP
QoS Tier:
memory: BestEffort
cpu: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
POD_IP: (v1:status.podIP)
POSTGRES_PASSWORD: testtorus
PGDATA: /var/lib/postgresql/data/pgdata
Conditions:
Type Status
Ready False
Volumes:
data:
<unknown>
default-token-iyxs6:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-iyxs6
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
27m 27m 1 {default-scheduler } Normal Scheduled Successfully assigned postgres-torus-92650690-9cow4 to 10.128.112.24
27m 5s 125 {kubelet 10.128.112.24} Warning FailedMount Unable to mount volumes for pod "postgres-torus-92650690-9cow4_tyd(c9192682-3209-11e6-a0b0-5254005f9388)": attach command failed, status: Failure, reason: no devices available
27m 5s 125 {kubelet 10.128.112.24} Warning FailedSync Error syncing pod, skipping: attach command failed, status: Failure, reason: no devices available
Wish I know how to debug this situation further.
Here are the yaml files.
torus-k8s-oneshot.yaml
apiVersion: v1
kind: Service
metadata:
labels:
name: etcd-torus
name: etcd-torus
spec:
type: NodePort
ports:
- port: 2379
name: etcd-client
targetPort: etcd-client
nodePort: 32379
selector:
name: etcd-torus
---
apiVersion: v1
kind: Service
metadata:
labels:
name: etcd-torus-internal
name: etcd-torus-internal
spec:
clusterIP: 10.1.0.100
ports:
- port: 2379
name: etcd-client
targetPort: etcd-client
selector:
name: etcd-torus
---
apiVersion: v1
kind: Pod
metadata:
labels:
name: etcd-torus
name: etcd-torus
spec:
containers:
- image: quay.io/coreos/etcd:v3.0.0-beta.0
name: etcd-torus
ports:
- name: etcd-peers
containerPort: 2380
- name: etcd-client
containerPort: 2379
volumeMounts:
- name: data
mountPath: /var/lib/etcd
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: ETCD_DATA_DIR
value: /var/lib/etcd
- name: ETCD_NAME
value: etcd
- name: ETCD_INITIAL_CLUSTER
value: etcd=http://$(POD_IP):2380
- name: ETCD_INITIAL_ADVERTISE_PEER_URLS
value: http://$(POD_IP):2380
- name: ETCD_ADVERTISE_CLIENT_URLS
value: http://$(POD_IP):2379
- name: ETCD_LISTEN_CLIENT_URLS
value: http://0.0.0.0:2379
- name: ETCD_LISTEN_PEER_URLS
value: http://$(POD_IP):2380
volumes:
- name: data
emptyDir: {}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: torus
labels:
app: torus
spec:
template:
metadata:
name: torus
labels:
daemon: torus
spec:
containers:
- name: torus
image: quay.io/coreos/torus:latest
ports:
- name: peer
containerPort: 40000
- name: http
containerPort: 4321
env:
- name: ETCD_HOST
value: $(ETCD_TORUS_SERVICE_HOST)
- name: STORAGE_SIZE
value: 2GiB
- name: LISTEN_HOST
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: AUTO_JOIN
value: "1"
- name: DEBUG_INIT
value: "1"
- name: DROP_MOUNT_BIN
value: "0"
volumeMounts:
- name: data
mountPath: /data
readOnly: false
volumes:
- name: data
hostPath:
path: /srv/torus
imagePullSecrets:
- name: quay-torus
postgres-oneshot.yaml
apiVersion: v1
kind: Service
metadata:
labels:
name: postgres-torus
name: postgres-torus
spec:
type: NodePort
ports:
- port: 5432
targetPort: postgres-client
nodePort: 30432
selector:
app: postgres-torus
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: postgres-torus
labels:
app: postgres-torus
spec:
replicas: 1
template:
metadata:
labels:
app: postgres-torus
spec:
containers:
- image: postgres
name: postgres
ports:
- name: postgres-client
containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POSTGRES_PASSWORD
value: testtorus
- name: PGDATA
value: "/var/lib/postgresql/data/pgdata"
volumes:
- name: data
flexVolume:
driver: "coreos.com/torus"
fsType: "ext4"
options:
volume: "pg1"
etcd: "10.1.0.100:2379"
There seems to have a secrets issue too. Unable to retrieve pull secret tyd/quay-torus for tyd/torus-k3f0h due to secrets "quay-torus" not found. The image pull may not succeed.
core@worker-3 ~ $ systemctl status -l kube-kubelet.service
● kube-kubelet.service - Kubernetes Kubelet
Loaded: loaded (/run/fleet/units/kube-kubelet.service; linked-runtime; vendor preset: disabled)
Active: active (running) since Tue 2016-06-14 07:59:11 UTC; 1h 7min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 14023 ExecStartPre=/usr/bin/wget -N -P /etc/cni/net.d http://10.128.112.21:8087/calico/10-calico.conf (code=exited, status=0/SUCCESS)
Process: 14019 ExecStartPre=/usr/bin/mkdir -p /etc/cni/net.d (code=exited, status=0/SUCCESS)
Process: 14015 ExecStartPre=/usr/bin/chmod +x /opt/cni/bin/calico (code=exited, status=0/SUCCESS)
Process: 14008 ExecStartPre=/usr/bin/wget -N -P /opt/cni/bin http://10.128.112.21:8087/calico/calico-cni/v1.1.0/calico (code=exited, status=0/SUCCESS)
Process: 14003 ExecStartPre=/usr/bin/mkdir -p /opt/cni/bin (code=exited, status=0/SUCCESS)
Process: 13993 ExecStartPre=/usr/bin/wget -N -P /srv/kubelet http://10.128.112.21:8087/tyd/alpha-dev-160520/srv/kubelet/kubeconfig (code=exited, status=0/SUCCESS)
Process: 13990 ExecStartPre=/usr/bin/mkdir -p /srv/kubelet (code=exited, status=0/SUCCESS)
Process: 13985 ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet (code=exited, status=0/SUCCESS)
Process: 13977 ExecStartPre=/usr/bin/wget -N -P /opt/bin http://10.128.112.21:8087/kubernetes/v1.2.0/kubelet (code=exited, status=0/SUCCESS)
Process: 13974 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)
Main PID: 14026 (kubelet)
Memory: 37.1M
CPU: 1min 21.927s
CGroup: /system.slice/kube-kubelet.service
├─14026 /opt/bin/kubelet --address=0.0.0.0 --allow_privileged=true --port=10250 --hostname_override=10.128.112.24 --api_servers=https://10.128.112.21:443 --kubeconfig=/srv/kubelet/kubeconfig --pod_infra_container_image=10.128.112.21:5000/kubernetes/pause --logtostderr=true --cluster_dns=10.1.0.10 --cluster_domain=cluster.local --max-pods=10000 --config=/etc/kubernetes/manifests/ --network-plugin-dir=/etc/cni/net.d --network-plugin=cni --volume-plugin-dir=/etc/kubernetes/volumeplugins
└─14094 journalctl -k -f
Jun 14 09:06:37 worker-3 kubelet[14026]: E0614 09:06:37.919525 14026 flexvolume.go:285] Failed to attach volume: data
Jun 14 09:06:37 worker-3 kubelet[14026]: E0614 09:06:37.919570 14026 kubelet.go:1780] Unable to mount volumes for pod "postgres-torus-92650690-9cow4_tyd(c9192682-3209-11e6-a0b0-5254005f9388)": attach command failed, status: Failure, reason: no devices available; skipping pod
Jun 14 09:06:37 worker-3 kubelet[14026]: E0614 09:06:37.919579 14026 pod_workers.go:138] Error syncing pod c9192682-3209-11e6-a0b0-5254005f9388, skipping: attach command failed, status: Failure, reason: no devices available
Jun 14 09:06:52 worker-3 kubelet[14026]: W0614 09:06:52.894614 14026 kubelet.go:1829] Unable to retrieve pull secret tyd/quay-torus for tyd/torus-k3f0h due to secrets "quay-torus" not found. The image pull may not succeed.
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904119 14026 flexvolume_util.go:128] Failed to attach volume data, output: {"status":"Failure","message":"no devices available"}
Jun 14 09:06:52 worker-3 kubelet[14026]: , error: exit status 1
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904189 14026 flexvolume_util.go:86] attach command failed, status: Failure, reason: no devices available
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904196 14026 flexvolume.go:285] Failed to attach volume: data
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904236 14026 kubelet.go:1780] Unable to mount volumes for pod "postgres-torus-92650690-9cow4_tyd(c9192682-3209-11e6-a0b0-5254005f9388)": attach command failed, status: Failure, reason: no devices available; skipping pod
Jun 14 09:06:52 worker-3 kubelet[14026]: E0614 09:06:52.904246 14026 pod_workers.go:138] Error syncing pod c9192682-3209-11e6-a0b0-5254005f9388, skipping: attach command failed, status: Failure, reason: no devices available
attach command failed, status: Failure, reason: no devices available
Looks like you don't have the nbd kernel modules loaded (modprobe nbd
)
You are correct, thanks!
On each worker below command is executed.
$ sudo modprobe nbd
The pod is running and I can follow the example:)
$ ./kubectl exec $TORUSPOD -- psql postgres -U postgres -c 'select * from films'
code | title | did | date_prod | kind | len
-------+---------+-----+-----------+--------+----------
UA502 | Bananas | 105 | | Comedy | 01:22:00
T_601 | Yojimbo | 106 | | Drama |
(2 rows)
@gosharplite glad it worked out. If you want to autoload the nbd module you can follow https://coreos.com/os/docs/latest/other-settings.html#loading-kernel-modules
@barakmich Perhaps the doc/guides can be improved with these additional steps.
Thanks! I actually have a coreos/coreos-baremetal
setup using ignition for automation.
Good idea. I'll file the issue.
@gosharplite 我现在也在学习使用torus,我想在我搭建好的kubernetes集群上安装torus,不过我在执行第四步时遇到了问题https://github.com/coreos/torus/tree/master/contrib/kubernetes,报错如下:
[root@****~]# torusctl -C 10.99.237.89:32379 list-peers 2017/01/18 18:10:24 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.99.237.89:32379: i/o timeout"; Reconnecting to "10.99.237.89:32379" 2017/01/18 18:10:44 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.99.237.89:32379: i/o timeout"; Reconnecting to "10.99.237.89:32379"
Here are the yaml files. torus-k8s-oneshot.yaml
`apiVersion: v1 kind: Service metadata: labels: name: etcd-torus name: etcd-torus spec: type: NodePort ports:
apiVersion: v1 kind: Service metadata: labels: name: etcd-torus-internal name: etcd-torus-internal spec: clusterIP: 10.99.237.89 ports:
apiVersion: v1 kind: Pod metadata: labels: name: etcd-torus name: etcd-torus spec: containers:
apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: torus labels: app: torus spec: template: metadata: name: torus labels: daemon: torus spec: containers:
postgres-oneshot.yaml
`apiVersion: v1 kind: Service metadata: labels: name: postgres-torus name: postgres-torus spec: type: NodePort ports:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: postgres-torus labels: app: postgres-torus spec: replicas: 1 template: metadata: labels: app: postgres-torus spec: containers:
这个是什么原因导致的?是我kubernetes配置不对吗?还是安装torus过程有哪些注意点?
在master节点上执行以下命令也报错 [root@** ~]# torusctl volume list couldn't connect to etcd: torus: no global metadata available at mds
@githubwithme 轉眼快半年沒有追 Torus,無法幫你看問題。最近趕著升級 k8s 1.5.2,因為 StatefulSets 似乎是未來在 k8s 架設 DB 的救星。CoreOS 用 ThirdPartyResource 提出的 Operator 觀念看起來也像在 k8s 上佈署 DB 的要素。我也在苦等 @barakmich 丟救生圈啊!
@gosharplite Torus性能如何?StatefulSets 功能和torus类似吗?
@githubwithme 加个http://试下 torusctl -C http://10.99.237.89:32379 list-peers
In CoreOS, /usr is a 'Read-only file system'. Below commands do not work.
Is there another way to install FlexVolume plugin in CoreOS?