etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.83k stars 9.77k forks source link

Cannot run etcd as a Windows service #10433

Closed jasper-d closed 4 years ago

jasper-d commented 5 years ago

Repro:

  1. Extract etcd binaries to C:\etcd
  2. mkdir C:\etcd\data
  3. Grant "Full Access" (rwx) to "NT AUTHORITY\Local Service" on C:\etcd
  4. Start an elevated command prompt
  5. Install the service: sc create etcd binpath= "C:\etcd\etcd.exe --data-dir C:\etcd\data" obj= "NT AUTHORITY\Local Service
  6. Start the service: net start etcd

Expected:

ectd service starts

Actual:

Workaround:

Additional information:

Running etcd.exe from the command prompt works fine. However, etcd service won't even run as "LocalSystem" (that's the "Do whatever you want" built-in account). I was able to reproduce the issue on multiple Win10 machines. I assume that it has something to do with the working directory (that's at least the most likely cause from my experience if an application can be started from cmd.exe but not as a service). The default working directory for a Windows service is C:\Windows\system32 (which is locked down for good reasons).

Environment:

hexfusion commented 5 years ago

Hi @jasper-d Looks like this is a known issue maybe you can help us to fix it? I do not have access to or have expertise with Windows machines to test so your help would be greatly appreciated.

ref: https://github.com/etcd-io/etcd/issues/3351 https://github.com/etcd-io/etcd/pull/3410

jasper-d commented 5 years ago

@hexfusion I'll look into it but it may take a few days becasue I have little to no experience with Go.

hexfusion commented 5 years ago

@jasper-d we can assist with the go if you can assist with windows testing. Take a look at the old existing PR above and see if it gives you any hints. Basically can you review the existing research that was done and see what is the proper method for managing a Windows service with golang? Maybe it is is the same in which case we can reuse that PR as a starting point.

1.) review existing PRs and issues. 2.) research current best practices for Windows service and golang

From here we have a good place to start, this will move it forward without code. Thanks!

jasper-d commented 5 years ago

@hexfusion I dont mind trying out some things and learning some go in the process. I got the PR working with some minor changes and will take a look at some go services that run on windows (i.e. gnatsd, Elastic Filebeat) to see how they do it.

hexfusion commented 5 years ago

Hi @jasper-d just checking in do you have any questions?

jasper-d commented 5 years ago

@hexfusion Not yet, I was occupied with some more pressing issues. I probably wont have time to look into it before next weekend.

tskarman commented 5 years ago

Just wanted to let you know that this is not a general issue. I am running etcd and the etcd grpc proxy as a Windows service across a wide variety of Windows machines (Windows Server 2012 R2, Windows Server 2016, Windows Server 2019, Windows 10) and have been for >6 months and across various etcd versions.

Of note:

I am using the pre-release version 2.2.4-101 linked on this page: https://nssm.cc/download Not sure whether the normal version would work.

With nssm I am specifying the etcd directory as the startup directory. Since you mentioned working directories, that might make a difference.

I am not doing anything too special parameter-wise. I am specifying various bindings explicitly, though. And also I am not binding anything to localhost, 127.0.0.1 or 0.0.0.0. Not that that should make a difference, though. The service usually fails to start very promptly if a port/binding is in use.

Example:

etcd --name etcd3 --client-cert-auth=true --listen-client-urls https://1.2.3.4:2379 --advertise-client-urls https://etcd3.example.com:2379 --listen-peer-urls https://1.2.3.4:2380 --initial-advertise-peer-urls https://etcd3.example.com:2380 --initial-cluster-token etcd-cluster-1 --discovery-srv example.com --initial-cluster-state existing --peer-cert-file C:\somepath\member3.pem --peer-key-file C:\somepath\member3-key.pem --peer-trusted-ca-file C:\somepath\ca.pem --cert-file C:\somepath\member3.pem --key-file C:\somepath\member3-key.pem --trusted-ca-file C:\somepath\ca.pem
haroldHT commented 5 years ago

@jasper-d if u want etcd work in win, u must make it become win service.

3410 It could be achieved in windows but can not work in linux

I want to make it better

haroldHT commented 5 years ago

@tskarman I can reproduce jasper-d's question when I use sc in windows 10.

@hexfusion Can I create a new PR ? Both work in windows and linux.

jasper-d commented 5 years ago

@haroldHT #3410 Does not gracefully stop etcd and has some other flaws. The reason that etcd does not work as a service is that it doesn't communicate with SCM. #3410 adds some basic support for it (using golang's svc package which essentially all windows services written in go use). Properly handling stop/shutdown as well as redirecting stdout/stderr (i.e enabling log output) requires some more work. You're welcome to contribute of course. šŸ™‚

@tskarman NSSM does a lot of stuff (i.e. stdout/stderr redirection). I reckon it does communicate with SCM as well which would explain why you can start etcd as a service when using it. However, relying on hackish 3rd party tools is a workaround, not a solution from my point of view.

tskarman commented 5 years ago

@jasper-d yes, I completely understand and now am interested in a solution as well. let me know when I can help you. My go is rusty and not a priority for me right now, but I could help with testing across the aforementioned operating systems.

That being said. I run etcd like this in production and have not encountered any reliability or responsivity or service signalling issue. So I would recommend this as a workaround for the time being.

haroldHT commented 5 years ago

@jasper-d #10460

But I am confused with the log output. The log(i.e etcd_err.log,etcd_out.log) position I can use cfg.ec.Dir, But the output of log whether etcd have some utils so I can use it.

And I do not know how to connect etcd's log to service. Thanks.

hexfusion commented 5 years ago

@hexfusion Can I create a new PR ? Both work in windows and linux.

@haroldHT thanks for showing interest in resolving this. Please work with @jasper-d and @tskarman on a solution then let me know if you have any questions.

haroldHT commented 5 years ago

@hexfusion Sorry,I always @ wrong people, Etcd have so many kind of log that it make me confuse, I need to spend a lot of time to understand.

jingyih commented 5 years ago

cc @wenjiaswe

wenjiaswe commented 5 years ago

Thank you all for helping out! @haroldHT also contacted me offline and showed interest in contributing on this as his first etcd contribution. I will assign @haroldHT for now, @jasper-d and @tskarman any help is welcome!

wenjiaswe commented 5 years ago

/assign @haroldHT

wenjiaswe commented 5 years ago

well, it seems like I cannot assign you @haroldHT now, this is a good place to start your contribution. Thanks!

jasper-d commented 5 years ago

@haroldHT I managed to botch up the code base enough to make etcd run as a windows service. It properly interacts with Windows Service Control Manager through x/sys/windows/svc. I haven't fully tested it yet, but as far as I can tell it works for ordinary cluster members, level 4 gateways and gRPC proxies. Log output is redirected to Windows Event Log or a file (logs are confusing indeed, I ended up redirecting every log that didn't hide well enough šŸ™ƒ). I need to clean up some stuff before I can push it to a public repo, but I will do so tomorrow so you can take a look.

cc: @hexfusion @tskarman

hexfusion commented 5 years ago

+1 -- ā€”

jasper-d commented 5 years ago

Changes (with some comments) are here: https://github.com/jasper-d/etcd/commit/9d9235226cc0d4c0f8de72d5d6e99d76ea30062c

The good:

The bad:

So, there is still a lot of work to do. Before continuing I would need to add at least some tests and set up a proper testing environment. The main problems remain the wealth of logs (I would certainly need some advice here) and the different ways in which etcd is started. I think that should be unified ideally, but that would probably be quite a refactoring (i.e. should be done by someone with a better understanding of the code base and go).

@haroldHT How is it going for you? @hexfusion I wouldn't mind investing some more time but you may wanna take a look at it first to determine if it's worth the effort. I also cannot make any definitive commitments to a timeline because it's essentially a pet project for learning some go in my spare time.

haroldHT commented 5 years ago

@wenjiaswe sorry, I did not reply in time. I will continue to follow your suggestion.

haroldHT commented 5 years ago

@jasper-d At the beginning I want to use kardianos/service to make etcd become a service. It seems a good solution if we can manage the number of logs. your solution also make me benefit a lot, thanks.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.