Closed jasper-d closed 4 years ago
Hi @jasper-d Looks like this is a known issue maybe you can help us to fix it? I do not have access to or have expertise with Windows machines to test so your help would be greatly appreciated.
ref: https://github.com/etcd-io/etcd/issues/3351 https://github.com/etcd-io/etcd/pull/3410
@hexfusion I'll look into it but it may take a few days becasue I have little to no experience with Go.
@jasper-d we can assist with the go if you can assist with windows testing. Take a look at the old existing PR above and see if it gives you any hints. Basically can you review the existing research that was done and see what is the proper method for managing a Windows service with golang? Maybe it is is the same in which case we can reuse that PR as a starting point.
1.) review existing PRs and issues. 2.) research current best practices for Windows service and golang
From here we have a good place to start, this will move it forward without code. Thanks!
@hexfusion I dont mind trying out some things and learning some go in the process. I got the PR working with some minor changes and will take a look at some go services that run on windows (i.e. gnatsd, Elastic Filebeat) to see how they do it.
Hi @jasper-d just checking in do you have any questions?
@hexfusion Not yet, I was occupied with some more pressing issues. I probably wont have time to look into it before next weekend.
Just wanted to let you know that this is not a general issue. I am running etcd and the etcd grpc proxy as a Windows service across a wide variety of Windows machines (Windows Server 2012 R2, Windows Server 2016, Windows Server 2019, Windows 10) and have been for >6 months and across various etcd versions.
Of note:
nssm
and not sc
. I never tried installing them with sc
so not sure whether that makes a difference.I am using the pre-release version 2.2.4-101 linked on this page: https://nssm.cc/download Not sure whether the normal version would work.
With nssm
I am specifying the etcd directory as the startup directory. Since you mentioned working directories, that might make a difference.
I am not doing anything too special parameter-wise. I am specifying various bindings explicitly, though. And also I am not binding anything to localhost
, 127.0.0.1
or 0.0.0.0
. Not that that should make a difference, though. The service usually fails to start very promptly if a port/binding is in use.
Example:
etcd --name etcd3 --client-cert-auth=true --listen-client-urls https://1.2.3.4:2379 --advertise-client-urls https://etcd3.example.com:2379 --listen-peer-urls https://1.2.3.4:2380 --initial-advertise-peer-urls https://etcd3.example.com:2380 --initial-cluster-token etcd-cluster-1 --discovery-srv example.com --initial-cluster-state existing --peer-cert-file C:\somepath\member3.pem --peer-key-file C:\somepath\member3-key.pem --peer-trusted-ca-file C:\somepath\ca.pem --cert-file C:\somepath\member3.pem --key-file C:\somepath\member3-key.pem --trusted-ca-file C:\somepath\ca.pem
@jasper-d if u want etcd work in win, u must make it become win service.
I want to make it better
@tskarman
I can reproduce jasper-d's question when I use sc
in windows 10.
@hexfusion Can I create a new PR ? Both work in windows and linux.
@haroldHT #3410 Does not gracefully stop etcd and has some other flaws. The reason that etcd does not work as a service is that it doesn't communicate with SCM. #3410 adds some basic support for it (using golang's svc package which essentially all windows services written in go use). Properly handling stop/shutdown as well as redirecting stdout/stderr (i.e enabling log output) requires some more work. You're welcome to contribute of course. š
@tskarman NSSM does a lot of stuff (i.e. stdout/stderr redirection). I reckon it does communicate with SCM as well which would explain why you can start etcd as a service when using it. However, relying on hackish 3rd party tools is a workaround, not a solution from my point of view.
@jasper-d yes, I completely understand and now am interested in a solution as well. let me know when I can help you. My go
is rusty and not a priority for me right now, but I could help with testing across the aforementioned operating systems.
That being said. I run etcd like this in production and have not encountered any reliability or responsivity or service signalling issue. So I would recommend this as a workaround for the time being.
@jasper-d #10460
But I am confused with the log output.
The log(i.e etcd_err.log,etcd_out.log) position I can use cfg.ec.Dir
,
But the output of log whether etcd have some utils so I can use it.
And I do not know how to connect etcd's log to service. Thanks.
@hexfusion Can I create a new PR ? Both work in windows and linux.
@haroldHT thanks for showing interest in resolving this. Please work with @jasper-d and @tskarman on a solution then let me know if you have any questions.
@hexfusion Sorry,I always @ wrong people, Etcd have so many kind of log that it make me confuse, I need to spend a lot of time to understand.
cc @wenjiaswe
Thank you all for helping out! @haroldHT also contacted me offline and showed interest in contributing on this as his first etcd contribution. I will assign @haroldHT for now, @jasper-d and @tskarman any help is welcome!
/assign @haroldHT
well, it seems like I cannot assign you @haroldHT now, this is a good place to start your contribution. Thanks!
@haroldHT I managed to botch up the code base enough to make etcd run as a windows service. It properly interacts with Windows Service Control Manager through x/sys/windows/svc
. I haven't fully tested it yet, but as far as I can tell it works for ordinary cluster members, level 4 gateways and gRPC proxies. Log output is redirected to Windows Event Log or a file (logs are confusing indeed, I ended up redirecting every log that didn't hide well enough š).
I need to clean up some stuff before I can push it to a public repo, but I will do so tomorrow so you can take a look.
cc: @hexfusion @tskarman
+1 -- ā
Changes (with some comments) are here: https://github.com/jasper-d/etcd/commit/9d9235226cc0d4c0f8de72d5d6e99d76ea30062c
The good:
The bad:
A system error has occurred. System error 1067 has occurred. The process terminated unexpectedly.
). I need to investigate what's happening there.So, there is still a lot of work to do. Before continuing I would need to add at least some tests and set up a proper testing environment. The main problems remain the wealth of logs (I would certainly need some advice here) and the different ways in which etcd is started. I think that should be unified ideally, but that would probably be quite a refactoring (i.e. should be done by someone with a better understanding of the code base and go).
@haroldHT How is it going for you? @hexfusion I wouldn't mind investing some more time but you may wanna take a look at it first to determine if it's worth the effort. I also cannot make any definitive commitments to a timeline because it's essentially a pet project for learning some go in my spare time.
@wenjiaswe sorry, I did not reply in time. I will continue to follow your suggestion.
@jasper-d At the beginning I want to use kardianos/service
to make etcd become a service.
It seems a good solution if we can manage the number of logs.
your solution also make me benefit a lot, thanks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
Repro:
sc create etcd binpath= "C:\etcd\etcd.exe --data-dir C:\etcd\data" obj= "NT AUTHORITY\Local Service
net start etcd
Expected:
ectd service starts
Actual:
C:\etcd\data contains the following files:
FullName, Length C:\etcd\data\member, 1 C:\etcd\data\member\snap, 1 C:\etcd\data\member\wal, 1 C:\etcd\data\member\snap\db, 32768 C:\etcd\data\member\snap\db.lock, 0 C:\etcd\data\member\wal\0.tmp, 64000000 C:\etcd\data\member\wal\0000000000000000-0000000000000000.wal, 64000000
Workaround:
Additional information:
Running etcd.exe from the command prompt works fine. However, etcd service won't even run as "LocalSystem" (that's the "Do whatever you want" built-in account). I was able to reproduce the issue on multiple Win10 machines. I assume that it has something to do with the working directory (that's at least the most likely cause from my experience if an application can be started from cmd.exe but not as a service). The default working directory for a Windows service is C:\Windows\system32 (which is locked down for good reasons).
Environment: