dapr / cli

Command-line tools for Dapr.
Apache License 2.0
321 stars 204 forks source link

Give scheduler a default volume, making it resilient to restarts by #1423

Closed JoshVanL closed 4 months ago

JoshVanL commented 4 months ago

default

Description

Please explain the changes you've made

Issue reference

We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.

Please reference the issue this PR will close: #[issue number]

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

yaron2 commented 4 months ago

Will merge after tests pass

JoshVanL commented 4 months ago

@artursouza @yaron2

Please don't merge.

That's not quite right- the --force flag ignores when the volume does not exit. An error here indicates that an actual problem occurred talking to the container runtime.

artursouza commented 4 months ago

@yaron2 should be good now.

artursouza commented 4 months ago

Interestingly enough the changes have resolved podman-starts but using docker(/-desktop) it's still complaining about operations on the data dir.

time="2024-07-17T16:57:49.536458383Z" level=info msg="Starting Dapr Scheduler Service -- version 1.14.0-rc.2 -- commit 3d30a6f3ae2191125ed9d3dcf0761077772e5de2" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536503675Z" level=info msg="Log level set to: info" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536643341Z" level=warning msg="mTLS is disabled. Skipping certificate request and tls validation" instance=6a5fdf505b55 scope=dapr.runtime.security type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536865591Z" level=info msg="Healthz server is listening on [::]:8080" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536898091Z" level=warning msg="etcd client http ports not set. This is not recommended for production." instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536926883Z" level=info msg="Dapr Scheduler is starting..." instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.5369168Z" level=info msg="metrics server started on :9090/" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536979341Z" level=info msg="Dapr Scheduler listening on: :50006" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.537011216Z" level=info msg="Starting etcd" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
{"level":"warn","ts":"2024-07-17T16:57:49.537048Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"info","ts":"2024-07-17T16:57:49.537073Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":"2024-07-17T16:57:49.53722Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["http://localhost:2379"]}
time="2024-07-17T16:57:49.537322091Z" level=info msg="Running gRPC server on port 50006" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
{"level":"info","ts":"2024-07-17T16:57:49.537343Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.14","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.22.4","go-os":"linux","go-arch":"arm64","max-cpu-set":10,"max-cpu-available":10,"member-initialized":false,"name":"dapr-scheduler-server-0","data-dir":"./data-default-dapr-scheduler-server-0","wal-dir":"","wal-dir-dedicated":"","member-dir":"data-default-dapr-scheduler-server-0/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"],"listen-client-urls":["http://localhost:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"dapr-scheduler-server-0=http://localhost:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"24h0m0s","auto-compaction-interval":"24h0m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"warn","ts":"2024-07-17T16:57:49.537376Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"./data-default-dapr-scheduler-server-0\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"info","ts":"2024-07-17T16:57:49.537402Z","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"dapr-scheduler-server-0","data-dir":"./data-default-dapr-scheduler-server-0","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
{"level":"info","ts":"2024-07-17T16:57:49.537423Z","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"dapr-scheduler-server-0","data-dir":"./data-default-dapr-scheduler-server-0","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
time="2024-07-17T16:57:49.537524966Z" level=info msg="Scheduler GRPC server stopped" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.537547383Z" level=info msg="Healthz server is shutting down" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.537604383Z" level=fatal msg="error running scheduler: cannot access data directory: open /data-default-dapr-scheduler-server-0/.touch: permission denied" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2

Actions run: https://github.com/dapr/rust-sdk/actions/runs/9977498812/job/27572563670#step:16:22

Did you delete the previous volume? dapr uninstall with an old version does not delete the volume.

yaron2 commented 4 months ago

Can merge once the E2E KinD test passes.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 21.87%. Comparing base (ddf43a5) to head (0fc44b4).

Files Patch % Lines
pkg/standalone/uninstall.go 0.00% 15 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## release-1.14 #1423 +/- ## ================================================ - Coverage 21.94% 21.87% -0.07% ================================================ Files 40 40 Lines 4913 4928 +15 ================================================ Hits 1078 1078 - Misses 3754 3769 +15 Partials 81 81 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mikeee commented 4 months ago

Interestingly enough the changes have resolved podman-starts but using docker(/-desktop) it's still complaining about operations on the data dir.

time="2024-07-17T16:57:49.536458383Z" level=info msg="Starting Dapr Scheduler Service -- version 1.14.0-rc.2 -- commit 3d30a6f3ae2191125ed9d3dcf0761077772e5de2" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536503675Z" level=info msg="Log level set to: info" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536643341Z" level=warning msg="mTLS is disabled. Skipping certificate request and tls validation" instance=6a5fdf505b55 scope=dapr.runtime.security type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536865591Z" level=info msg="Healthz server is listening on [::]:8080" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536898091Z" level=warning msg="etcd client http ports not set. This is not recommended for production." instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536926883Z" level=info msg="Dapr Scheduler is starting..." instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.5369168Z" level=info msg="metrics server started on :9090/" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.536979341Z" level=info msg="Dapr Scheduler listening on: :50006" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.537011216Z" level=info msg="Starting etcd" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
{"level":"warn","ts":"2024-07-17T16:57:49.537048Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"info","ts":"2024-07-17T16:57:49.537073Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":"2024-07-17T16:57:49.53722Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["http://localhost:2379"]}
time="2024-07-17T16:57:49.537322091Z" level=info msg="Running gRPC server on port 50006" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
{"level":"info","ts":"2024-07-17T16:57:49.537343Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.14","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.22.4","go-os":"linux","go-arch":"arm64","max-cpu-set":10,"max-cpu-available":10,"member-initialized":false,"name":"dapr-scheduler-server-0","data-dir":"./data-default-dapr-scheduler-server-0","wal-dir":"","wal-dir-dedicated":"","member-dir":"data-default-dapr-scheduler-server-0/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"],"listen-client-urls":["http://localhost:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"dapr-scheduler-server-0=http://localhost:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"24h0m0s","auto-compaction-interval":"24h0m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"warn","ts":"2024-07-17T16:57:49.537376Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"./data-default-dapr-scheduler-server-0\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"info","ts":"2024-07-17T16:57:49.537402Z","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"dapr-scheduler-server-0","data-dir":"./data-default-dapr-scheduler-server-0","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
{"level":"info","ts":"2024-07-17T16:57:49.537423Z","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"dapr-scheduler-server-0","data-dir":"./data-default-dapr-scheduler-server-0","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
time="2024-07-17T16:57:49.537524966Z" level=info msg="Scheduler GRPC server stopped" instance=6a5fdf505b55 scope=dapr.scheduler.server type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.537547383Z" level=info msg="Healthz server is shutting down" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2
time="2024-07-17T16:57:49.537604383Z" level=fatal msg="error running scheduler: cannot access data directory: open /data-default-dapr-scheduler-server-0/.touch: permission denied" instance=6a5fdf505b55 scope=dapr.scheduler type=log ver=1.14.0-rc.2

Actions run: https://github.com/dapr/rust-sdk/actions/runs/9977498812/job/27572563670#step:16:22

Did you delete the previous volume? dapr uninstall with an old version does not delete the volume.

I ensured an uninstall of dapr including ~/.dapr/, reset docker and I'm still seeing this issue based on the latest commit- -0fc44b44

The issue is also appearing on actions runs.

The only solution has been the first commit from this PR - https://github.com/dapr/cli/pull/1422 where a default folder was created with standard permissions (0755) followed by a change to 0777.