Remote scanner tasks fail

tigattack commented 3 years ago

Describe the bug Tasks using remote scanners (set up as per Gitbook docs) fail within a minute or two of starting.
Connectivity to the remote scanners has been verified from Configuration -> Scanners, but after the task fails, connectivity verifications against the relevant scanner turn to "service unavailable".

To Reproduce Steps to reproduce the behavior:

Start task using remote scanner
Wait a minute or two
See stopped job in error state.

Expected behavior Expected task to continue running correctly.

Screenshots Pre-run: Post-run: Report error details after failed run:

Additional context GVM logs:

gvm    | event task:MESSAGE:2021-04-04 00h03.10 UTC:1087: Status of task ScanName (0eaeca71-8a6c-49be-aaf2-6ab1e90f0bf4) has changed to Requested
gvm    | event task:MESSAGE:2021-04-04 00h03.10 UTC:1087: Task ScanName (0eaeca71-8a6c-49be-aaf2-6ab1e90f0bf4) has been requested to start by admin
gvm    | event task:MESSAGE:2021-04-04 00h03.29 UTC:1090: Status of task ScanName (0eaeca71-8a6c-49be-aaf2-6ab1e90f0bf4) has changed to Queued
gvm    | event task:MESSAGE:2021-04-04 00h03.40 UTC:1090: Status of task ScanName (0eaeca71-8a6c-49be-aaf2-6ab1e90f0bf4) has changed to Running
gvm    | md manage:WARNING:2021-04-04 00h04.59 UTC:1090: OSP get_scan db6699bf-abb0-4ca3-bf79-261715d3e01c: Couldn't send get_scans command to scanner
gvm    | md manage:WARNING:2021-04-04 00h04.59 UTC:1090: Could not connect to Scanner at /sockets/d4ohe0nan8.sock
gvm    | event task:MESSAGE:2021-04-04 00h04.59 UTC:1090: Status of task ScanName (0eaeca71-8a6c-49be-aaf2-6ab1e90f0bf4) has changed to Stopped

OpenVAS logs:

scanner    | OSPD[41] 2021-04-04 00:03:29,650: INFO: (ospd.command.command) Scan db6699bf-abb0-4ca3-bf79-261715d3e01c added to the queue in position 1.
scanner    | OSPD[41] 2021-04-04 00:03:39,346: INFO: (ospd.ospd) Currently 1 queued scans.
scanner    | OSPD[41] 2021-04-04 00:03:39,447: INFO: (ospd.ospd) Starting scan db6699bf-abb0-4ca3-bf79-261715d3e01c.
scanner    |
scanner    | Oops, ksba_cert_get_image failed: imagelen=181  hdr=4 len=1238 off=0

tigattack commented 3 years ago

I'm still seeing the same issue on the GVM side (i.e. connection lost, task status is changed to "stopped"), but the scan continues to run on the remote OpenVAS scanner.

thecomet28 commented 3 years ago

I added TCPKeepAlive no to the /etc/ssh/ssh_config file. For us, that fixed the issue. It's mainly dependent on the network connection that makes it stop.

tigattack commented 3 years ago

Unfortunately this seems to have had no effect for me, but thank you anyway.

The connection between GVM and the remote scanner is not local but it's generally very stable, and certainly not so frequently unstable as to cause every scan to fail within a matter of minutes.

thecomet28 commented 3 years ago

@tigattack how large of a network are you scanning? I might know the issue but I need confirmation on the size.

tigattack commented 3 years ago

@thecomet28 around 35 devices.

austinsonger commented 3 years ago

We use this docker container with clients that have hundreds of devices

@tigattack Have you tried building from 21.4.0-v2 was released? We've also done some work to the new pre-release 21.4.0-v3.

tigattack commented 3 years ago

The output was slightly different this time, and I did get a small number of results before it failed, but in the end the scan still failed.
Interestingly, the task on the remote scanner itself didn't actually fail or stop this time. As seen from the logs below, GVM marked the scan as stopped after failing to communicate, but the remote scanner actually kept running th scan until I stopped it manually.

I'm now using the 21.4.0-v3 image.

GVM:

gvm    | event task:MESSAGE:2021-05-23 01h43.16 UTC:14018: Status of task Test Scan (9eb10691-0749-4731-909e-321bdae8d201) has changed to Requested
gvm    | event task:MESSAGE:2021-05-23 01h43.16 UTC:14018: Task Test Scan (9eb10691-0749-4731-909e-321bdae8d201) has been requested to start by admin
gvm    | event task:MESSAGE:2021-05-23 01h43.19 UTC:14021: Status of task Test Scan (9eb10691-0749-4731-909e-321bdae8d201) has changed to Queued
gvm    | event task:MESSAGE:2021-05-23 01h43.24 UTC:14021: Status of task Test Scan (9eb10691-0749-4731-909e-321bdae8d201) has changed to Running
gvm    | md manage:WARNING:2021-05-23 01h45.25 UTC:14021: OSP get_scan ca421e71-3e07-42fa-be5e-830db7a7932c: Couldn't send get_scans command to scanner
gvm    | md manage:WARNING:2021-05-23 01h45.25 UTC:14021: Connection lost with the scanner at /sockets/idkz8hhdmh.sock. Trying again in 1 second.
gvm    | md manage:WARNING:2021-05-23 01h45.26 UTC:14021: Could not connect to Scanner at /sockets/idkz8hhdmh.sock
gvm    | md manage:WARNING:2021-05-23 01h45.26 UTC:14021: Connection lost with the scanner at /sockets/idkz8hhdmh.sock. Trying again in 1 second.
gvm    | md manage:WARNING:2021-05-23 01h45.27 UTC:14021: Could not connect to Scanner at /sockets/idkz8hhdmh.sock
gvm    | md manage:WARNING:2021-05-23 01h45.27 UTC:14021: Connection lost with the scanner at /sockets/idkz8hhdmh.sock. Trying again in 1 second.
gvm    | md manage:WARNING:2021-05-23 01h45.28 UTC:14021: Could not connect to Scanner at /sockets/idkz8hhdmh.sock
gvm    | md manage:WARNING:2021-05-23 01h45.28 UTC:14021: Could not connect to Scanner at /sockets/idkz8hhdmh.sock
gvm    | event task:MESSAGE:2021-05-23 01h45.28 UTC:14021: Status of task Test Scan (9eb10691-0749-4731-909e-321bdae8d201) has changed to Stopped
gvm    |
gvm    | ==> /usr/local/var/log/gvm/sshd.log <==
gvm    | Accepted publickey for gvm from 10.50.0.51 port 34302 ssh2: ED25519 SHA256:DZ3uFK8iP/hCcEpLwOuNaekMLJn8rcGfKTQ2ZUOG3p0

Remote scanner:

==> /usr/local/var/log/gvm/ospd-openvas.log <==
OSPD[43] 2021-05-23 01:43:19,403: INFO: (ospd.command.command) Scan ca421e71-3e07-42fa-be5e-830db7a7932c added to the queue in position 1.
OSPD[43] 2021-05-23 01:43:21,264: INFO: (ospd.ospd) Currently 1 queued scans.
OSPD[43] 2021-05-23 01:43:21,302: INFO: (ospd.ospd) Starting scan ca421e71-3e07-42fa-be5e-830db7a7932c.

==> /usr/local/var/log/gvm/openvas.log <==
sd   main:MESSAGE:2021-05-23 01h44.24 utc:194: openvas 21.4.0 started
<100s of lines of scan logs>

austinsonger commented 3 years ago

@pixelsquared

thecomet28 commented 3 years ago

Any update on this? Experiencing this with some clients but not all clients now with the latest version

thecomet28 commented 3 years ago

Looking at the logs from the remote docker, it looks like autossh is having an issue and then times out after a while. Note: this scan was running for approx. 36 hours before crashing. And this only affecting like 2-3 of my remote scanners while the other 8 work perfectly fine.

==> /usr/local/var/log/gvm/ssh-connection.log <==
2021/06/17 09:15:48 autossh[45]: received SIGALRM (end-of-life 0)
2021/06/17 09:15:48 autossh[45]: check on child 1783565
2021/06/17 09:15:48 autossh[45]: clear alarm timer (0 secs left)
2021/06/17 09:15:48 autossh[45]: set alarm for 600 secs

==> /usr/local/var/log/gvm/ssh-connection.log <==
2021/06/17 09:25:48 autossh[45]: received SIGALRM (end-of-life 0)
2021/06/17 09:25:48 autossh[45]: check on child 1783565
2021/06/17 09:25:48 autossh[45]: clear alarm timer (0 secs left)
2021/06/17 09:25:48 autossh[45]: set alarm for 600 secs

==> /usr/local/var/log/gvm/ssh-connection.log <==
2021/06/17 09:35:48 autossh[45]: received SIGALRM (end-of-life 0)
2021/06/17 09:35:48 autossh[45]: check on child 1783565
2021/06/17 09:35:48 autossh[45]: clear alarm timer (0 secs left)
2021/06/17 09:35:48 autossh[45]: set alarm for 600 secs

==> /usr/local/var/log/gvm/ssh-connection.log <==
2021/06/17 09:45:48 autossh[45]: received SIGALRM (end-of-life 0)
2021/06/17 09:45:48 autossh[45]: check on child 1783565
2021/06/17 09:45:48 autossh[45]: clear alarm timer (0 secs left)
2021/06/17 09:45:48 autossh[45]: set alarm for 600 secs

Timeout, server _[insert Master IP address here]_ not responding.

==> /usr/local/var/log/gvm/ssh-connection.log <==
2021/06/17 09:50:07 autossh[45]: check on child 1783565
2021/06/17 09:50:07 autossh[45]: ssh exited with error status 255; restarting ssh
2021/06/17 09:50:07 autossh[45]: expired child, returning 1
2021/06/17 09:50:07 autossh[45]: clear alarm timer (340 secs left)
2021/06/17 09:50:07 autossh[45]: checking for grace period, tries = 0
2021/06/17 09:50:07 autossh[45]: starting ssh (count 3)
2021/06/17 09:50:07 autossh[45]: ssh child pid is 4104322
2021/06/17 09:50:07 autossh[45]: check on child 4104322
2021/06/17 09:50:07 autossh[45]: clear alarm timer (0 secs left)
2021/06/17 09:50:07 autossh[4104322]: child of 45 execing /usr/bin/ssh
2021/06/17 09:50:07 autossh[45]: set alarm for 600 secs

Not sure if anyone else is getting the same issue in their logs.

thecomet28 commented 3 years ago

Some more information that may potentially help. I'm still looking for a solution ourselves, but every effort has been unsuccessful.

connect to /data/ospd.sock port -2 failed: Connection refused
connect to /data/ospd.sock port -2 failed: Connection refused
connect to /data/ospd.sock port -2 failed: Connection refused
connect to /data/ospd.sock port -2 failed: Connection refused
connect to /data/ospd.sock port -2 failed: Connection refused
connect to /data/ospd.sock port -2 failed: Connection refused
client_loop: send disconnect: Broken pipe
connect to /data/ospd.sock port -2 failed: Connection refused

Dexus commented 3 years ago

This problem should in theory fixes in the new v21.4.0-v6 but has other changes with it, that need some additional changes on you configuration. But at last the new version should fix problems like the ospd "is died" without notices or other services.

So I prefer to use the latest version and keep your eyes on the gitbook docs. And let me know how it works for you in some days.

tigattack commented 3 years ago

Thanks, I'll give this a go as soon as an image is available. I attempted to build it but ran into a couple of issues which unfortunately I currently don't have time to troubleshoot.

Dexus commented 2 years ago

You can now use the https://hub.docker.com/r/deineagenturug/gvm images, which are using the latest gvmd and components releases.

It use POSTGRESQL 13.

What are meaning the tags:

docker pull deineagenturug/gvm:latest               # is an not initialized basic image, that first start its initialization on the first start (short: no db & no PDF report support)
docker pull deineagenturug/gvm:latest-full          # is an not initialized basic image, that first start its initialization on the first start (short: no db, PDF report is support)
docker pull deineagenturug/gvm:latest-data          # is an initialized basic image, that can run directly without extra VOLUMES added (short: included db, but no PDF report support)
docker pull deineagenturug/gvm:latest-data-full     # is an initialized basic image, that can run directly without extra VOLUMES added (short: included db & PDF report is support)

It can be used as direct replacement for the here provided image, with the same arguments on docker run.

If you like to run your first scan directly after deploying the image, you should use the deineagenturug/gvm:latest-data or deineagenturug/gvm:latest-data-full image.

Secure-Compliance-Solutions-LLC / GVM-Docker

Remote scanner tasks fail #151