Closed 7yl4r closed 5 years ago
I think USF-IMARS/server-status#104 is the final holdback on this.
NFS shares from thing2 are mounting as nobody:nobody on imars-airflow21, but cozumel NFS shares work and configuration is identical. NFS shares from thing2 work elsewhere though (tested on my local & other af nodes).
# === airflow-21 mount trouble:
# no worky:
thing2.marine.usf.edu: /west_fl_pen /srv/imars-objects/west_fl_pen nfs4 ro,nosuid,nodev,noexec,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.4, local_lock=none,addr=131.247.136.202 0 0
thing2.marine.usf.edu: /west_fl_pen /srv/imars-objects/west_fl_pen nfs4 ro,nosuid,nodev,noexec,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.4, local_lock=none,addr=131.247.136.202 0 0
thing2.marine.usf.edu: /A01 /srv/imars-objects/A01 nfs4 ro,nosuid,nodev,noexec,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.4, local_lock=none,addr=131.247.136.202 0 0
# yes worky:
## airflow21
cozumel.marine.usf.edu:/monroe /srv/imars-objects/monroe nfs4 ro,nosuid,nodev,noexec,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.4, local_lock=none,addr=131.247.136.162 0 0
## tylardesk
thing2master: /west_fl_pen /srv/imars-objects/west_fl_pen nfs4 rw,nosuid,nodev,noexec,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.249,local_lock=none,addr=192.168.1.202 0 0
This is getting ridiculous. I just somehow broke ssh logins again.
I am taking a more aggressive approach. Let's copy authconfig from cozumel.
[root@cozumel ~]# authconfig --savebackup=2019-03-13
[root@cozumel ~]# rsync -hazv /var/lib/authconfig/backup-2019-03-13 root@thing2master:/var/lib/authconfig/cozumel-2019-03-13
[root@thing2 ~]# authconfig --savebackup=2019-03-13
[root@thing2 ~]# cd /var/lib/authconfig/
[root@thing2 authconfig]# mkdir manual-2019-03-13
[root@thing2 authconfig]# cp backup-2019-03-13/group manual-2019-03-13/. && cp backup-2019-03-13/passwd manual-2019-03-13/passwd && cp backup-2019-03-13/shadow manual-2019-03-13/. && cp backup-2019-03-13/gshadow manual-2019-03-13/.
[root@thing2 authconfig]# authconfig --restorebackup=manual-2019-03-13 --update
... no change. ssh still broken and still getting nobody:nobody
from af-21. :hankey: :angry: :hankey:
[root@thing2 authconfig]# getent passwd tylar
tylar:*:4747:504:Tylar Murray:/home1/tylar:/bin/bash
worker is now working. I gave up on ldap & just set up the user manually. Tasks running on imars-airflow-21 are not showing up in the airflow metadata database (result_backend?), but I will follow up on that elsewhere.
I resized (& restarted) the VM in Azure. On coming back up files on thing2 were showing as nobody:nobody again.
The Domain
in /etc/idmapd.conf
had been changed. :thinking:
That's a good sign I shouldn't be doing this, but I changed it back:
#Domain=iaordjrjcdketajxumkj5z1keg.bx.internal.cloudapp.net
Domain=marine.usf.edu
Now the airflow group works but the airflow user and imars-common are showing as nobody?
-bash-4.2$ ls -lah /srv/imars-objects/big_bend
total 140K
drwxrwxr-x. 9 nobody nobody 4.0K Mar 26 01:25 .
drwxr-xr-x. 3 root root 0 Mar 27 21:49 ..
drwxr-xr-x. 2 nobody airflow 48K Mar 27 21:35 ntf_wv2_m1bs
drwxr-xr-x. 2 nobody nobody 20K Sep 4 2018 tif_r_rs_wv2
drwxrwxr-x. 2 nobody nobody 4.0K Jul 30 2018 tif_r_rs_wv2_backup_2018-08-03
drwxrwxr-x. 4 nobody nobody 28 Jun 28 2018 wv2
drwxr-xr-x. 2 nobody airflow 36K Mar 27 21:36 xml_wv2_m1bs
drwxr-xr-x. 2 nobody airflow 4.0K Mar 5 21:26 zip_wv2_ftp_ingest
drwxr-xr-x. 2 nobody airflow 4.0K Mar 5 22:13 zip_wv3_ftp_ingest
cozumel shares still work fine:
-bash-4.2$ ls -lah /srv/imars-objects/gom
total 580K
drwxrwxr-x. 7 airflow imars-common 4.0K Jan 9 22:22 .
drwxr-xr-x. 4 root root 0 Mar 27 21:56 ..
drwxrwxr-x. 2 airflow imars-common 196K Jan 10 05:53 chlor_a_l3_pass
drwxr-xr-x. 2 airflow airflow 16K Jan 10 13:29 myd01
drwxr-xr-x. 2 airflow airflow 216K Jan 10 15:09 myd0_otis_l2
drwxr-xr-x. 2 airflow airflow 4.0K Jan 10 15:16 s3a_ol_1_efr
drwxr-xr-x. 293 airflow imars-common 8.0K Jan 7 19:46 sst_avhrr_1km
what does that mean?
Rebooted and... /etc/imapd.conf
is gone?
[imars-admin@imars-airflow-21 etc]$ sudo ls -lh /etc/imapd.conf
ls: cannot access /etc/imapd.conf: No such file or directory
Wat?
Airflow worker running from out-of-LAN is likely to run into a few roadblocks. Here are some likely issues:
celery connectivity?(seems ok)graphite allowed IPs for metric ingests(* already allowed)