Closed belforte closed 2 years ago
After discuss with @mapellidario (Thanks a lot!). We found out that WMCore’s team already uses nscd to cache domain names https://github.com/dmwm/WMCore/issues/9435, and every puppet managed VM (TaskWorker/Schedd) has nscd set up and running.
The problem is, in TaskWorker VM, we run our app inside docker container, we do not bind mount nscd socket to the container, so glibc cannot query nscd that runs outside the container. This is the reason why we have alert from TaskWorker machine only.
We only need to change ./runContainer.sh scripts, append -v /var/run/nscd/socket:/var/run/nscd/socket
option to docker run command.
Done
runContainer.sh
are deployed by puppet.
We received another email concerning crab-prod-tw01 that does not cache DNS queries about cmsweb.cern.ch domain. We should have a look. Should we re-open this issue?
yes
What I got from investigating this morning:
systemctl status nscd
to check if daemon has been crash/restart, but output report for 7 month.
[tseethon@crab-prod-tw01 ~]$ sudo systemctl status nscd -l
● nscd.service - Name Service Cache Daemon
Loaded: loaded (/usr/lib/systemd/system/nscd.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-10-21 08:49:22 CEST; 7 months 9 days ago
Main PID: 8804 (nscd)
Tasks: 9
Memory: 2.0M
CGroup: /system.slice/nscd.service
└─8804 /usr/sbin/nscd
journalctl -eu nscd
(run as root) to see the full log. There is no restart that handle by systemd, but look like nscd restart itself every hour (crosscheck with ps
, PID and match with the log lines).
[tseethon@crab-prod-tw01 ~]$ journalctl -eu nscd
May 30 21:10:34 crab-prod-tw01.cern.ch nscd[9640]: 9640 Access Vector Cache (AVC) started
May 30 22:09:49 crab-prod-tw01.cern.ch nscd[9640]: 9640 monitored file `/etc/resolv.conf` was
May 30 22:10:34 crab-prod-tw01.cern.ch nscd[8402]: 8402 monitoring file `/etc/hosts` (1)
May 30 22:10:34 crab-prod-tw01.cern.ch nscd[8402]: 8402 monitoring directory `/etc` (2)
May 30 22:10:34 crab-prod-tw01.cern.ch nscd[8402]: 8402 monitoring file `/etc/resolv.conf` (3)
May 30 22:10:34 crab-prod-tw01.cern.ch nscd[8402]: 8402 monitoring directory `/etc` (2)
May 30 22:10:34 crab-prod-tw01.cern.ch nscd[8402]: 8402 monitoring file `/etc/services` (4)
May 30 22:10:34 crab-prod-tw01.cern.ch nscd[8402]: 8402 monitoring directory `/etc` (2)
May 30 22:10:34 crab-prod-tw01.cern.ch nscd[8402]: 8402 Access Vector Cache (AVC) started
May 30 23:10:40 crab-prod-tw01.cern.ch nscd[14238]: 14238 monitoring file `/etc/hosts` (1)
May 30 23:10:40 crab-prod-tw01.cern.ch nscd[14238]: 14238 monitoring directory `/etc` (2)
May 30 23:10:40 crab-prod-tw01.cern.ch nscd[14238]: 14238 monitoring file `/etc/resolv.conf` (
May 30 23:10:40 crab-prod-tw01.cern.ch nscd[14238]: 14238 monitoring directory `/etc` (2)
May 30 23:10:40 crab-prod-tw01.cern.ch nscd[14238]: 14238 monitoring file `/etc/services` (4)
May 30 23:10:40 crab-prod-tw01.cern.ch nscd[14238]: 14238 monitoring directory `/etc` (2)
May 30 23:10:40 crab-prod-tw01.cern.ch nscd[14238]: 14238 Access Vector Cache (AVC) started
May 30 23:36:34 crab-prod-tw01.cern.ch nscd[14238]: 14238 monitored file `/etc/resolv.conf` wa
May 31 00:11:06 crab-prod-tw01.cern.ch nscd[9590]: 9590 monitoring file `/etc/hosts` (1)
May 31 00:11:06 crab-prod-tw01.cern.ch nscd[9590]: 9590 monitoring directory `/etc` (2)
May 31 00:11:06 crab-prod-tw01.cern.ch nscd[9590]: 9590 monitoring file `/etc/resolv.conf` (3)
...
...
...
May 31 09:29:41 crab-prod-tw01.cern.ch nscd[22997]: 22997 monitored file `/etc/resolv.conf` wa
May 31 10:12:20 crab-prod-tw01.cern.ch nscd[8804]: 8804 monitoring file `/etc/hosts` (1)
May 31 10:12:20 crab-prod-tw01.cern.ch nscd[8804]: 8804 monitoring directory `/etc` (2)
May 31 10:12:20 crab-prod-tw01.cern.ch nscd[8804]: 8804 monitoring file `/etc/resolv.conf` (3)
May 31 10:12:20 crab-prod-tw01.cern.ch nscd[8804]: 8804 monitoring directory `/etc` (2)
May 31 10:12:20 crab-prod-tw01.cern.ch nscd[8804]: 8804 monitoring file `/etc/services` (4)
May 31 10:12:20 crab-prod-tw01.cern.ch nscd[8804]: 8804 monitoring directory `/etc` (2)
May 31 10:12:20 crab-prod-tw01.cern.ch nscd[8804]: 8804 Access Vector Cache (AVC) started
[tseethon@crab-prod-tw01 ~]$ ps uax | grep nscd
nscd 8804 0.0 0.0 714084 2068 ? Ssl 10:12 0:00 /usr/sbin/nscd
tseethon 14058 0.0 0.0 112812 980 pts/0 S+ 10:28 0:00 grep --color=auto nscd
I monitor the rate of DNS query to DNS server by using tcpdump -n -i eth0 'port 53'
and see how fast tcpdump
logs will flow. I did not have an exact number like query/s, but what I saw is logs flow much faster rate compared to before I restarted nscd. After restarting it for 5 mins, log flow rate was back to normal. So, I could assume that it worked as usual before I restarted it.
My conclusion:
Note from meeting minutes:
Crab working finely but we seem to have lost DNS caching on crab-prod-tw01 (TW container). Wa thinks this can be explained as a temporary issue on nscd which at times crashes and restarts by itself. No action needed on our side. “systemctl status nscd” to check if daemon has been crash/restart, “journalctl -feu nscd” (run as root) to see the full log. (wa: I don’t know how to check if nscd is really worked or how to query nscd directly without query DNS server)
Bottom line: do not worry unless messages from CERN firewall monitoring keep coming !
thanks @novicecpp , I believe that we can close (again). If problem comes back we will find this to use as reference. I leave to you and Dario final decision on closing.
Closing
We got this[1] from automatic services in IT. We think that biggest problem for us is the number of HTTP Queries, which mostly come from schedd's ( #7002 ) but still it will be good to have as DNS caching in our VM's. Possibilities are the ones indicated by CERN below at http://service-dns.web.cern.ch/service-dns/faq.asp But also we can look at https://coredns.io/plugins/cache/
@novicecpp please look at the various possibilities, determine which one is the simplest and safest which meet our needs (especially in terms of long term support and maintenance) and make a proposal for deployment which we can review.
[1]