Open nitinpatil1992 opened 2 years ago
Can you paste the files in /etc/containerd/certs.d
?
This directory contains image registries mirror configruation.
Example: https://d7y.io/docs/setup/runtime/containerd/mirror#option-2-multiple-registries
For docker.io,
/etc/containerd/certs.d/docker.io/hosts.toml
server = "https://index.docker.io"
[host."http://127.0.0.1:65001"]
capabilities = ["pull"]
[host."http://127.0.0.1:65001".header]
X-Dragonfly-Registry = ["https://index.docker.io"]
How about single-registry option > Version 2 config without config_path? Is it supported?
In my case there is nothing under /etc/containerd/
other than config.toml(config-kops.yaml)
How about single-registry option > Version 2 config without config_path? Is it supported? In my case there is nothing under
/etc/containerd/
other than config.toml(config-kops.yaml)
Yes, follow this https://d7y.io/docs/setup/runtime/containerd/mirror/#option-1-single-registry
I did that. The effects are similar to what @nitinpatil1992 wrote. Also deployed with helm Here is my config.toml.
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "k8s.gcr.io/pause:3.6"
[plugins."io.containerd.grpc.v1.cri".containerd]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry-1.docker.io"]
endpoint = ["http://127.0.0.1:65001","https://registry-1.docker.io"]
Dragonfly version: v2.0.2/v2.0.3
OS: Ubuntu 20.04.3 LTS
Kernel (e.g. uname -a): 5.11.0-1021-aws #22~20.04.2-Ubuntu SMP Wed Oct 27 21:27:13 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Other: containerd://1.4.12
My Helm values:
containerRuntime:
containerd:
enable: true
configFileName: "config-kops.toml"
manager:
ingress:
enable: true
className: private
hosts:
- "dragonfly.example.com"
tls:
- secretName: secure-tls
hosts:
- "dragonfly.example.com"
cdn:
enable: true
Is there anything I can provide to redirect us to correct path?
Did you restart the containerd daemon ?
In https://github.com/containerd/containerd/blob/main/docs/cri/registry.md, mirror config :
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
...
not registry-1.docker.com
Did you restart the containerd daemon ?
Yeah, it is done by helm charts itself
if [[ "$need_restart" -gt 0 ]]; then
nsenter -t 1 -m systemctl -- restart containerd.service
fi
not registry-1.docker.com
Bingo! It seems that nodes started to exchanging blobs. :man_facepalming: Now I need to set up auth by providing docker credencials.
Regerding missing peer task done
. How about documenting it better, in the official tutorial for helm chart? I can add a note in here Wdyt @jim3ma ?
It seems that the contaienrd did not restart. You can check the logs of container update-containerd
in dfdaemon.
I met similar issue and turned to follow Containerd > Version 2 config with config_path instructions to setup registry, then it works well.
This is what my /etc/containerd/certs.d/docker.io/hosts.toml
looks like:
server = "https://registry-1.docker.io"
[host."http://localhost:65001"]
capabilities = ["pull"]
skip_verify = true
Then I pull image using ctr
specifying hosts-dir
:
ctr images pull --hosts-dir "/etc/containerd/certs.d" docker.io/library/alpine:latest
When pull finished, I can find related logs in dfdaemon
:
This issue comment may be helpful: https://github.com/containerd/containerd/issues/5407#issuecomment-825322092
@jim3ma here is out certs.d looks like
ls -al /etc/containerd/certs.d
total 0
drwxr-xr-x 5 root root 62 Jun 9 09:52 .
drwxr--r-- 3 root root 40 Jun 9 09:52 ..
drwxr-xr-x 2 root root 24 Jun 9 09:52 ghcr.io
drwxr-xr-x 2 root root 24 Jun 9 09:52 harbor.example.com
drwxr-xr-x 2 root root 24 Jun 9 09:52 quay.io
$ cat /etc/containerd/certs.d/quay.io/hosts.toml
server = "https://quay.io"
[host."http://127.0.0.1:65001"]
capabilities = ["pull", "resolve"]
[host."http://127.0.0.1:65001".header]
X-Dragonfly-Registry = ["https://quay.io"]
@czomo can you please share your full containerd config? Also, did you just use the localhost endpoint to pull image or actual docker host name?
Alson noticed the dfget config under deamon, the download settings has port 65000
but couldn't find out where this is being exposed/used.
download:
calculateDigest: true
downloadGRPC:
security:
insecure: true
unixListen:
socket: /tmp/dfdamon.sock
peerGRPC:
security:
insecure: true
tcpListen:
listen: 0.0.0.0
port: 65000
perPeerRateLimit: 100Mi
totalRateLimit: 200Mi
@jim3ma here is out certs.d looks like
ls -al /etc/containerd/certs.d total 0 drwxr-xr-x 5 root root 62 Jun 9 09:52 . drwxr--r-- 3 root root 40 Jun 9 09:52 .. drwxr-xr-x 2 root root 24 Jun 9 09:52 ghcr.io drwxr-xr-x 2 root root 24 Jun 9 09:52 harbor.example.com drwxr-xr-x 2 root root 24 Jun 9 09:52 quay.io
$ cat /etc/containerd/certs.d/quay.io/hosts.toml server = "https://quay.io" [host."http://127.0.0.1:65001"] capabilities = ["pull", "resolve"] [host."http://127.0.0.1:65001".header] X-Dragonfly-Registry = ["https://quay.io"]
@czomo can you please share your full containerd config? Also, did you just use the localhost endpoint to pull image or actual docker host name?
I am using containerd 1.4.12(1.5+ have slightly different structure) hence there is no hosts.toml/certs.d and I am restricted to mirror only one registry. This is how looks like my final and full config. Works however I am hitting pulling limit(~35 nodes - 5k pods). Will be working on adding auth to it in following week
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "k8s.gcr.io/pause:3.6"
[plugins."io.containerd.grpc.v1.cri".containerd]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["http://127.0.0.1:65001","https://docker.io"]
Also, did you just use the localhost endpoint to pull image or actual docker host name?
Not sure about this one but rather the localhost 127.0.0.1:65001 as above
dragonfly version: 2.0.7 helm chart: 0.8.7
Don't know if I hit the same issue, but I was able to make image pull work for private registry, but unfortunately, the tasks are not distributed across dfdaemon agents. Peer tasks only occur in the dfdaemon agent where I trigger the pull via crictl and I'm bitterly stuck with this.
My config for containerd:
[plugins."io.containerd.grpc.v1.cri".registry.configs."127.0.0.1:65001".auth]
auth = "********"
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."my-private-registry.example.com"]
endpoint = ["http://127.0.0.1:65001","https://my-private-registry.example.com"]
dfdaemon conf:
aliveTime: 0s
gcInterval: 1m0s
keepStorage: false
workHome:
logDir:
cacheDir:
pluginDir:
dataDir: /var/lib/dragonfly
console: true
health:
path: /server/ping
tcpListen:
port: 40901
verbose: false
jaeger: http://dragonfly-jaeger-collector.dragonfly-system.svc.cluster.local:14268/api/traces
scheduler:
manager:
enable: true
netAddrs:
- addr: dragonfly-manager.dragonfly-system.svc.cluster.local:65003
type: tcp
refreshInterval: 5m
scheduleTimeout: 30s
disableAutoBackSource: false
seedPeer:
clusterID: 1
enable: false
type: super
host:
idc: ""
location: ""
netTopology: ""
securityDomain: ""
download:
calculateDigest: true
concurrent:
goroutineCount: 10
initBackoff: 0.5
maxAttempts: 3
maxBackoff: 3
thresholdSize: 100M
thresholdSpeed: 200M
downloadGRPC:
security:
insecure: true
tlsVerify: false
unixListen:
socket: /run/dragonfly/dfdaemon.sock
peerGRPC:
security:
insecure: true
tcpListen:
port: 65000
perPeerRateLimit: 512Mi
prefetch: false
totalRateLimit: 1024Mi
upload:
rateLimit: 1024Mi
security:
insecure: true
tlsVerify: false
tcpListen:
port: 65002
objectStorage:
enable: false
filter: Expires&Signature&ns
maxReplicas: 3
security:
insecure: true
tlsVerify: true
tcpListen:
port: 65004
storage:
diskGCThreshold: 50Gi
multiplex: true
strategy: io.d7y.storage.v2.simple
taskExpireTime: 6h
proxy:
defaultFilter: Expires&Signature&ns
defaultTag:
tcpListen:
port: 65001
security:
insecure: true
tlsVerify: false
registryMirror:
dynamic: false
insecure: false
url: https://my-private-registry.example.com
proxies:
- regx: blobs/sha256.*
security:
autoIssueCert: false
caCert: ""
certSpec:
validityPeriod: 4320h
tlsPolicy: prefer
tlsVerify: false
network:
enableIPv6: false
announcer:
schedulerInterval: 30s
scheduler conf:
server:
port: 8002
workHome:
logDir:
cacheDir:
pluginDir:
dataDir:
scheduler:
algorithm: default
backSourceCount: 3
gc:
hostGCInterval: 1h
peerGCInterval: 10s
peerTTL: 24h
taskGCInterval: 30m
retryBackSourceLimit: 5
retryInterval: 50ms
retryLimit: 10
dynconfig:
refreshInterval: 10s
type: manager
host:
idc: ""
location: ""
netTopology: ""
manager:
addr: dragonfly-manager.dragonfly-system.svc.cluster.local:65003
schedulerClusterID: 1
keepAlive:
interval: 5s
seedPeer:
enable: true
job:
redis:
addrs:
- dragonfly-redis-master.dragonfly-system.svc.cluster.local:6379
host: dragonfly-redis-master.dragonfly-system.svc.cluster.local
port: 6379
password: dragonfly
storage:
bufferSize: 100
maxBackups: 10
maxSize: 100
security:
autoIssueCert: false
caCert: ""
certSpec:
validityPeriod: 4320h
tlsPolicy: prefer
tlsVerify: false
network:
enableIPv6: false
metrics:
enable: false
addr: ":8000"
enablePeerHost: false
console: true
verbose: false
jaeger: http://dragonfly-jaeger-collector.dragonfly-system.svc.cluster.local:14268/api/traces
manager conf:
server:
rest:
addr: :8080
grpc:
port:
start: 65003
end: 65003
workHome:
logDir:
cacheDir:
pluginDir:
database:
mysql:
user: dragonfly
password: dragonfly
host: dragonfly-mysql.dragonfly-system.svc.cluster.local
port: 3306
dbname: manager
migrate: true
redis:
addrs:
- dragonfly-redis-master.dragonfly-system.svc.cluster.local:6379
host: dragonfly-redis-master.dragonfly-system.svc.cluster.local
port: 6379
password: dragonfly
cache:
local:
size: 10000
ttl: 10s
redis:
ttl: 30s
objectStorage:
accessKey: ""
enable: false
endpoint: ""
name: s3
region: ""
secretKey: ""
security:
autoIssueCert: false
caCert: ""
caKey: ""
certSpec:
dnsNames:
- dragonfly-manager
- dragonfly-manager.dragonfly-system.svc
- dragonfly-manager.dragonfly-system.svc.cluster.local
ipAddresses: null
validityPeriod: 87600h
tlsPolicy: prefer
network:
enableIPv6: false
metrics:
enable: false
addr: ":8000"
console: true
verbose: false
jaeger: http://dragonfly-jaeger-collector.dragonfly-system.svc.cluster.local:14268/api/traces
Bug report:
We have deployed the dragonfly on containerd based host machines using helm.
But when we pull the image on the one of the box, the sibling box doesn't appear to have pulled the image.
Here is the config for docker daemon
No logs for peer pulling the images in daemonsets
Expected behavior:
Logs need to there when grepped with
peer task done
How to reproduce it:
Environment:
uname -a
): Linux 5.10.112-108.499.amzn2.x86_64 1 SMP Wed Apr 27 23:39:40 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux