Closed kushaldas closed 3 years ago
Discussed at standup today. No one's seen this the initial report. Could have been resolved by securedrop-config changes in https://github.com/freedomofpress/securedrop/pull/5684, or perhaps https://github.com/freedomofpress/securedrop/pull/5712/commits/004dc3c684194006448485583a9b38641ef750a9 Leaving open for now, if no one sees it again by feature freeze, we should be good to close.
I was able to repro this error today. This didn't happen to me for my focal
vagrant box but it did happen for my xenial box. I am using different boxes than @kushaldas , see:
> vagrant box list
bento/ubuntu-16.04 (libvirt, 202102.02.0)
bento/ubuntu-16.04 (virtualbox, 202102.02.0)
bento/ubuntu-20.04 (libvirt, 202012.23.0)
bento/ubuntu-20.04 (virtualbox, 202012.23.0)
I also want to point out that I am on the latest develop
branch as of https://github.com/freedomofpress/securedrop/commit/240a9a9d933cf1c59f02e64543f24abc11f14ef5, and my virtual environment was created today.
And the error I saw was:
failed: [mon-staging] (item=[1, 'securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb']) => {"ansible_loop_var": "item", "changed": false, "item": [1, "securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb"], "msg": "dpkg --force-confdef --force-confold -i /root/securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb failed", "stderr": "+ manage_tor_repo_config\n+ rm -f /etc/apt/sources.list.d/deb_torproject_org_torproject_org.list\n+ rm -f /etc/apt/sources.list.d/tor_apt_freedom_press.list\n+ apt_security_list=/etc/apt/security.list\n+ [ -f /etc/apt/security.list ]\n+ sed -i /deb\\.torproject\\.org\\/torproject\\.org/d /etc/apt/security.list\n+ sed -i /tor-apt\\.freedom\\.press/d /etc/apt/security.list\n+ remove_2fa_tty_req\n+ auth_file=/etc/pam.d/common-auth\n+ sed -i /^auth\\ required\\ pam_google.*/d /etc/pam.d/common-auth\n+ grep -qF PasswordAuthentication no /etc/ssh/sshd_config\n+ echo PasswordAuthentication no\n+ sed -i /^UsePAM\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config\n+ sed -i /^ChallengeResponseAuthentication\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config\n+ service ssh restart\n+ update_release_prompt\n+ set -e\n+ upgrade_config=/etc/update-manager/release-upgrades\n+ sed -i s/Prompt=.*/Prompt=never/ /etc/update-manager/release-upgrades\nsed: can't read /etc/update-manager/release-upgrades: No such file or directory\ndpkg: error processing package securedrop-config (--install):\n subprocess installed post-installation script returned error exit status 2\nErrors were encountered while processing:\n securedrop-config\n", "stderr_lines": ["+ manage_tor_repo_config", "+ rm -f /etc/apt/sources.list.d/deb_torproject_org_torproject_org.list", "+ rm -f /etc/apt/sources.list.d/tor_apt_freedom_press.list", "+ apt_security_list=/etc/apt/security.list", "+ [ -f /etc/apt/security.list ]", "+ sed -i /deb\\.torproject\\.org\\/torproject\\.org/d /etc/apt/security.list", "+ sed -i /tor-apt\\.freedom\\.press/d /etc/apt/security.list", "+ remove_2fa_tty_req", "+ auth_file=/etc/pam.d/common-auth", "+ sed -i /^auth\\ required\\ pam_google.*/d /etc/pam.d/common-auth", "+ grep -qF PasswordAuthentication no /etc/ssh/sshd_config", "+ echo PasswordAuthentication no", "+ sed -i /^UsePAM\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config", "+ sed -i /^ChallengeResponseAuthentication\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config", "+ service ssh restart", "+ update_release_prompt", "+ set -e", "+ upgrade_config=/etc/update-manager/release-upgrades", "+ sed -i s/Prompt=.*/Prompt=never/ /etc/update-manager/release-upgrades", "sed: can't read /etc/update-manager/release-upgrades: No such file or directory", "dpkg: error processing package securedrop-config (--install):", " subprocess installed post-installation script returned error exit status 2", "Errors were encountered while processing:", " securedrop-config"], "stdout": "Selecting previously unselected package securedrop-config.\n(Reading database ... 42733 files and directories currently installed.)\nPreparing to unpack .../securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb ...\nUnpacking securedrop-config (0.1.4+1.8.0~rc1+xenial) ...\nSetting up securedrop-config (0.1.4+1.8.0~rc1+xenial) ...\n", "stdout_lines": ["Selecting previously unselected package securedrop-config.", "(Reading database ... 42733 files and directories currently installed.)", "Preparing to unpack .../securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb ...", "Unpacking securedrop-config (0.1.4+1.8.0~rc1+xenial) ...", "Setting up securedrop-config (0.1.4+1.8.0~rc1+xenial) ..."]}
failed: [app-staging] (item=[1, 'securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb']) => {"ansible_loop_var": "item", "changed": false, "item": [1, "securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb"], "msg": "dpkg --force-confdef --force-confold -i /root/securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb failed", "stderr": "+ manage_tor_repo_config\n+ rm -f /etc/apt/sources.list.d/deb_torproject_org_torproject_org.list\n+ rm -f /etc/apt/sources.list.d/tor_apt_freedom_press.list\n+ apt_security_list=/etc/apt/security.list\n+ [ -f /etc/apt/security.list ]\n+ sed -i /deb\\.torproject\\.org\\/torproject\\.org/d /etc/apt/security.list\n+ sed -i /tor-apt\\.freedom\\.press/d /etc/apt/security.list\n+ remove_2fa_tty_req\n+ auth_file=/etc/pam.d/common-auth\n+ sed -i /^auth\\ required\\ pam_google.*/d /etc/pam.d/common-auth\n+ grep -qF PasswordAuthentication no /etc/ssh/sshd_config\n+ echo PasswordAuthentication no\n+ sed -i /^UsePAM\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config\n+ sed -i /^ChallengeResponseAuthentication\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config\n+ service ssh restart\n+ update_release_prompt\n+ set -e\n+ upgrade_config=/etc/update-manager/release-upgrades\n+ sed -i s/Prompt=.*/Prompt=never/ /etc/update-manager/release-upgrades\nsed: can't read /etc/update-manager/release-upgrades: No such file or directory\ndpkg: error processing package securedrop-config (--install):\n subprocess installed post-installation script returned error exit status 2\nErrors were encountered while processing:\n securedrop-config\n", "stderr_lines": ["+ manage_tor_repo_config", "+ rm -f /etc/apt/sources.list.d/deb_torproject_org_torproject_org.list", "+ rm -f /etc/apt/sources.list.d/tor_apt_freedom_press.list", "+ apt_security_list=/etc/apt/security.list", "+ [ -f /etc/apt/security.list ]", "+ sed -i /deb\\.torproject\\.org\\/torproject\\.org/d /etc/apt/security.list", "+ sed -i /tor-apt\\.freedom\\.press/d /etc/apt/security.list", "+ remove_2fa_tty_req", "+ auth_file=/etc/pam.d/common-auth", "+ sed -i /^auth\\ required\\ pam_google.*/d /etc/pam.d/common-auth", "+ grep -qF PasswordAuthentication no /etc/ssh/sshd_config", "+ echo PasswordAuthentication no", "+ sed -i /^UsePAM\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config", "+ sed -i /^ChallengeResponseAuthentication\\ /s/\\ .*/\\ no/ /etc/ssh/sshd_config", "+ service ssh restart", "+ update_release_prompt", "+ set -e", "+ upgrade_config=/etc/update-manager/release-upgrades", "+ sed -i s/Prompt=.*/Prompt=never/ /etc/update-manager/release-upgrades", "sed: can't read /etc/update-manager/release-upgrades: No such file or directory", "dpkg: error processing package securedrop-config (--install):", " subprocess installed post-installation script returned error exit status 2", "Errors were encountered while processing:", " securedrop-config"], "stdout": "Selecting previously unselected package securedrop-config.\n(Reading database ... 42733 files and directories currently installed.)\nPreparing to unpack .../securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb ...\nUnpacking securedrop-config (0.1.4+1.8.0~rc1+xenial) ...\nSetting up securedrop-config (0.1.4+1.8.0~rc1+xenial) ...\n", "stdout_lines": ["Selecting previously unselected package securedrop-config.", "(Reading database ... 42733 files and directories currently installed.)", "Preparing to unpack .../securedrop-config-0.1.4+1.8.0~rc1+xenial-amd64.deb ...", "Unpacking securedrop-config (0.1.4+1.8.0~rc1+xenial) ...", "Setting up securedrop-config (0.1.4+1.8.0~rc1+xenial) ..."]}
Specifically, what stands out is: sed: can't read /etc/update-manager/release-upgrades: No such file or directory
, which @emkll pointed out means that the ubuntu-release-upgrader-core
package is missing (you can confirm by running apt-file search /etc/update-manager/release-upgrades
.
To add the package via ansible, you can apply this diff (provided by @emkll):
diff --git a/install_files/ansible-base/roles/common/vars/Ubuntu_xenial.yml b/install_files/ansible-base/roles/common/vars/Ubuntu_xenial.yml
index 55d9453be..5778424e9 100644
--- a/install_files/ansible-base/roles/common/vars/Ubuntu_xenial.yml
+++ b/install_files/ansible-base/roles/common/vars/Ubuntu_xenial.yml
@@ -18,3 +18,4 @@ securedrop_common_packages:
- ntpdate
- resolvconf
- tmux
+ - ubuntu-release-upgrader-core
Then molecule destroy -s libvirt-staging-xenial
and rerun make staging
. I assume this package wasn't needed for my build of the focal staging servers because it was already included with the 202012.23.0
vagrant box for focal.
After fixing the first error above, I saw a new error from running make staging
:
TASK [ossec : Register OSSEC agent.] *******************************************
fatal: [app-staging]: FAILED! => {"changed": true, "cmd": ["/var/ossec/bin/agent-auth", "-m", "10.0.1.3", "-p", "1515", "-A", "app-staging", "-P", "/var/ossec/etc/authd.pass"], "delta": "0:02:09.451390", "end": "2021-02-11 21:13:24.196073", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-02-11 21:11:14.744683", "stderr": "2021/02/11 21:11:14 ossec-authd: INFO: Started (pid: 26761).\n2021/02/11 21:13:24 ossec-authd: Unable to connect to 10.0.1.3:1515", "stderr_lines": ["2021/02/11 21:11:14 ossec-authd: INFO: Started (pid: 26761).", "2021/02/11 21:13:24 ossec-authd: Unable to connect to 10.0.1.3:1515"], "stdout": "INFO: Using specified password.", "stdout_lines": ["INFO: Using specified password."]}
Specifically, what stands out is: ossec-authd: Unable to connect to 10.0.1.3:1515
.
I started to wonder if there could be an issue with sharing the same local directory between xenial and focal builds. I believe in CI we use separate directories when running make staging
and make staging-focal
because we have separate build jobs. When I searched locally for my *.aths
files in install_files/ansible-base
they didn't exist and the tor_v3_keys.json
contained information, but I wasn't sure if it was for xenial or focal. I think running make staging
deleted my *.aths
files for focal (I'm assuming they were there to begin with). The /var/lib/tor/services
directory on my focal staging vm was also missing so the onion urls were unknown.
I decided to clone another version of securedrop and separate my staging builds for focal and xenial. I still saw the ossec-authd: Unable to connect to 10.0.1.3:1515
error during make staging
for xenial but, the good news is, the *.aths
files and /var/lib/tor/services
were no longer missing.
At this point, in order to continue development and make progress towards SecureDrop 1.8.0, it might make the most sense to start using the same version of vagrant boxes used in CI, since those seem to be working. I'll be able to confirm later whether or not this works locally for me. It is surprising that @kushaldas saw the first error on the same focal box that we use in CI. You can see which versions of vagrant boxes CI should be using (https://github.com/freedomofpress/infrastructure/blob/b3dbae358a0e9071044edc1a5c5ab3d2bff8ebde/playbooks/sd-ci-gce-nested-virt-box.yml#L30-L34):
sd_vagrant_boxes:
- name: bento/ubuntu-16.04
version: 202008.16.0
- name: bento/ubuntu-20.04
version: 202008.16.0
So I'll be able to confirm this error as well if there's time today (I do have a working focal build now and am hesitate to destroy it since I still have yet to work on https://github.com/freedomofpress/securedrop/issues/5688 today).
Root cause appears to be https://github.com/chef/bento/commit/6921eb35736a52508a37c15fe9c62ff1944b775d , which ostensibly affects all recent Ubuntu Vagrant boxes from Bento. Haven't checked precisely when that commit was released to the prod boxes, but clearly that's what's going on.
The proposed of installing the package makes sense to me as a quick fix. It appears to be a Vagrant-only variation, but doesn't hurt to be explicit in the config. In the spirit of #2743, updating in the securedrop-config
dependencies as suggested in the OP will minimize surprises going forward. N.B. I've been using ISO-based Qubes VMs, which lack the Bento box customizations, and this problem hasn't occurred, which means the behavior for production installs remains predictable.
@conorsch curious what your thoughts are about the second error around ossec as well as using separate securedrop directories for building xenial and focal staging servers since the second build will overwrite the auth files from the first build. We could also try manually renaming the *aths and the tor_v3_keys.json
files (maybe append -focal or -xenial) until we create a fix.
Update: It looks like the tor_v3_key.json file will be the same between focal and xenial, but the following files differ between builds and are overwritten when sharing the same directory: app-journalist-aths app-journalist.auth_private app-source-ths app-sourcev3-ths
Not sure what's causing the OSSEC failure you mention. You can't have both the Xenial & the Staging environments configured simultaneously, since they use the same internal IPs:
$ grep -oPI '10\.[\d\.]+' -r molecule/*staging*
molecule/libvirt-staging-focal/molecule.yml:10.0.1.2
molecule/libvirt-staging-focal/molecule.yml:10.0.1.2
molecule/libvirt-staging-focal/molecule.yml:10.0.1.3
molecule/libvirt-staging-focal/molecule.yml:10.0.1.3
molecule/libvirt-staging-xenial/molecule.yml:10.0.1.2
molecule/libvirt-staging-xenial/molecule.yml:10.0.1.2
molecule/libvirt-staging-xenial/molecule.yml:10.0.1.3
molecule/libvirt-staging-xenial/molecule.yml:10.0.1.3
molecule/virtualbox-staging-xenial/molecule.yml:10.0.1.2
molecule/virtualbox-staging-xenial/molecule.yml:10.0.1.2
molecule/virtualbox-staging-xenial/molecule.yml:10.0.1.3
molecule/virtualbox-staging-xenial/molecule.yml:10.0.1.3
There's no reason to suspect tor configuration, since the failing task is an inter-VM OSSEC communication. So I'd recommending running those commands manually, and scanning the ports between the two VMs, to determine whether that service is actually running, and whether it's reachable from the other VM.
I see, we don't support having both a focal staging server and xenial staging server on the same machine. We could support this in the future, but for now, I will just run focal staging here, since it's working for me. And then to avoid the xenial staging ossec error that has come up repeatedly throughout the day, i will test changes on my xenial home server. It just means that I will have to building the securedrop-app-code
package and shuttle it over to my home server to install it each time I want to test a code change that cannot be tested in a dev environment (docker), e.g. v2 warnings do not show up in a dev environment. i could also run different staging servers on separate vms in Qubes. or maybe use different ips for focal staging builds. there is no shortage of ways for me to be clever about this.
we don't support having both a focal staging server and xenial staging server on the same machine
Oops, I was unclear: you can certainly have both Xenial & Focal environments on the same host machine, just not at the exact same time. If you tried to bring up both environments at the exact same time, I would expect an error, but it may be possible that the networking settings would simply get confused. Certainly there's a networking problem of some kind behind the registration error, since that's an inter-VM connection, but it may not be that one specifically.
It's also possible that you never had the two environments running at the same time. Maybe it's worthwhile to ground a bit in steps to reproduce. Try this:
molecule destroy -s libvirt-staging-xenial
molecule destroy -s libvirt-staging-focal
virt-manager
and ensure that no SD staging VMs are visible. If any are, delete them from the virt-manager
interface.install_files/ansible-base/roles/common/vars/Ubuntu_xenial.yml
you shared above appliedmake build-debs
make staging
Does the OSSEC registration failure occur again? If so, sounds like a networking problem. Debug by logging into the VMs and running the commands manually, see if you can get a connection. Try opening up the firewall rules and see if that helps.
If the OSSEC registration does not occur again, great, then it may have been a problem with reusing the IPs between both environments. We could proactively update the environments to use different internal IPs, but let's not do that until we're sure that's causing a problem for you!
Oops, I was unclear: you can certainly have both Xenial & Focal environments on the same host machine, just not at the exact same time. If you tried to bring up both environments at the exact same time
Sorry, that makes more sense. I've just been headsdown on this for so long I've forgotten how to communicate. My thinking is that the focal servers need to be shut down before running make staging
for the first time to build the xenial servers, and maybe that will fix the ossec networking issue. But also there seems to be an unacknowledged issue of the second run of make staging
overwriting the aths files, meaning: app-journalist-aths
, app-journalist.auth_private
, app-source-ths
, app-sourcev3-ths
files. This means it'll be more difficult run playbooks against the staging server with no aths files right?
I'll try running though your instructions shortly. I have a test running right now that should be done in ~30 mins.
Yes, any time "make staging" is run, it will clobber the records of the onion urls, as you pointed out in
It looks like the tor_v3_key.json file will be the same between focal and xenial, but the following files differ between builds and are overwritten when sharing the same directory: app-journalist-aths app-journalist.auth_private app-source-ths app-sourcev3-ths
That's a valid observation, but unrelated (as far as I can tell) to both the ubuntu-release-upgrader-core
problem and the OSSEC agent registration failure. Even though the onion url files are clobber, that doesn't affect the onion services inside the machines, they'll still be the same as ever. So you can back up the files locally, or you can log into the app-staging
VM (e.g. molecule login -s libvirt-staging-focal -h app-staging
) and retrieve the onion urls from /var/lib/tor/
.
It seems that what I experienced were 3 different issues:
ubuntu-release-upgrader-core
not being installed - easy fix as documented in my original commentinstall_files/ansible-base/app*
filesAnother thing we might want to recommend to developers is to use the same vagrant box versions as we use in CI. Something to discuss more later.
the ossec error - just confirmed that shutting down any running vagrant boxes fixes this issue (thanks for the feedback above to help me realize that was the issue)
I could not reproduce it and then understood that you tried to run two staging instances at the same time. There will be clash before of the IP address values. I think that is why you got the error.
Description
securedrop-config
package should depend on ubuntu-release-upgrader-core package.Steps to Reproduce
While trying to install
SecureDrop
onFocal
on prod vm based onbento/ubuntu-20.04 (libvirt, 202008.16.0)
image, thesecuredrop-config
package postinstallation failed as the ubuntu-release-upgrader-core package is missing.Expected Behavior
The installation should finish normally.
Actual Behavior
Comments
Suggestions to fix, any other relevant information.