future-architect / vuls

Agent-less vulnerability scanner for Linux, FreeBSD, Container, WordPress, Programming language libraries, Network devices
https://vuls.io/
GNU General Public License v3.0
10.97k stars 1.16k forks source link

Performance issue #1695

Open 4durban opened 1 year ago

4durban commented 1 year ago

Hello,

We have deployed the Vuls application following the client database architecture. I will try to explain myself:

In our architecture the clients connect to the Vuls-Server through HTTP sending the properly curl:

device:/tmp$ curl --max-time 900 --connect-timeout 900 -X POST -H "Content-Type: text/plain" -H "X-Vuls-OS-Family: `lsb_release -si | awk '{print tolower($1)}'`" -H "X-Vuls-OS-Release: `lsb_release -sr | awk '{print $1}'`" -H "X-Vuls-Kernel-Release: `uname -r`" -H "X-Vuls-Server-Name: `hostname`" --data-binary @packages.txt https://vuls-server.domain.com/vuls

We see this error in the vuls-server

user/vuls-server-787cb77b88-r4l95[vuls-server]: time="Jun 26 13:27:48" level=debug msg="HTTP Request to http://vuls-gost:1325/ubuntu/2204/pkgs/libfastjson/fixed-cves" 
user/vuls-server-787cb77b88-r4l95[vuls-server]: time="Jun 26 13:27:48" level=warning msg="Failed to HTTP GET. retrying in 512.518063ms seconds. err: HTTP GET error, url: http://vuls-gost:1325/ubuntu/2204/pkgs/firefox/fixed-cves, resp: <nil>, err: [Get \"http://vuls-gost:1325/ubuntu/2204/pkgs/firefox/fixed-cves\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)]:\n    github.com/future-architect/vuls/gost.httpGet.func1\n        /go/src/github.com/future-architect/vuls/gost/util.go:164" 
user/vuls-server-787cb77b88-r4l95[vuls-server]: time="Jun 26 13:27:48" level=debug msg="HTTP Request to http://vuls-gost:1325/ubuntu/2204/pkgs/linux-signed-hwe-5.19/fixed-cves" 

And after 3 appearings of that message (MAX_RETRIES I guess) we see this error in the vuls-server

vuls/vuls-server-76774d9b7f-lcrtt[vuls-server]: time="Jun 27 14:41:00" level=error msg="Failed to detect Pkg CVE: Failed to detect CVE with gost:\n    github.com/future-architect/vuls/detector.DetectPkgCves\n        /go/src/github.com/future-architect/vuls/detector/detector.go:228\n  - Failed to detect CVEs with gost:\n    github.com/future-architect/vuls/detector.detectPkgsCvesWithGost\n        /go/src/github.com/future-architect/vuls/detector/detector.go:477\n  - Failed to detect fixed CVEs. err:\n    github.com/future-architect/vuls/gost.Ubuntu.DetectCVEs\n        /go/src/github.com/future-architect/vuls/gost/ubuntu.go:88\n  - Failed to get fixed CVEs via HTTP. err:\n    github.com/future-architect/vuls/gost.Ubuntu.detectCVEsWithFixState\n        /go/src/github.com/future-architect/vuls/gost/ubuntu.go:112\n  - Failed to fetch Gost. err: %!w([]error=[0xc002d341b0 0xc002209ad0 0xc002d534a0 0xc0034800f0 0xc000a96060 0xc000f3a240 0xc002c03140 0xc003c19500 0xc002682600 0xc0027d7b60 0xc0027d7bc0]):\n    github.com/future-architect/vuls/gost.getCvesWithFixStateViaHTTP\n        /go/src/github.com/future-architect/vuls/gost/util.go:146" 

And if we check the curl output we see the following error:

device:/tmp$ curl --max-time 900 --connect-timeout 900 -X POST -H "Content-Type: text/plain" -H "X-Vuls-OS-Family: `lsb_release -si | awk '{print tolower($1)}'`" -H "X-Vuls-OS-Release: `lsb_release -sr | awk '{print $1}'`" -H "X-Vuls-Kernel-Release: `uname -r`" -H "X-Vuls-Server-Name: `hostname`" --data-binary @packages.txt https://${VULS_SERVER}/vuls
Failed to detect CVE with gost: Failed to detect CVEs with gost: Failed to detect unfixed CVEs. err: Failed to get fixed CVEs via HTTP. err: Failed to fetch Gost. err: %!w([]error=[0xc000aaa00

The architecture that we are following is that the vuls server is in one k8s pod and each database in different pods.

The user connects to the server with the curl in HTTP and the server connects to databases also through HTTP.

We think that it is a performance issue, we have tried increasing resources to the pods and it seems to solve the problem partially for few endpoints, but as soon as we increase to multiple endpoints (more than 3) we seem to hit that performance ceiling, and the issues start to appear again. If we only get the CVEs for one endpoint everything works fine, but the bigger the amount, the more issues appear.

Do you know what could be causing the problem? Is there a parameter that we need to fix in order to improve the performance? Something like generate child threads or something like that?

Thank you for your time!

MaineK00n commented 1 year ago

I am writing this on the assumption that the GET requests between Vuls and DB(In this case, gost DB) is failing due to timeout.

It may be solved by changing the gost timeout so that it can be adjusted. https://github.com/future-architect/vuls/blob/4253550c999d27fac802f616dbe50dd884e93f51/gost/util.go#L133 https://github.com/future-architect/vuls/blob/4253550c999d27fac802f616dbe50dd884e93f51/gost/util.go#L158

If you don't mind, could you please set the hard-coded timeout to a longer time and verify if the error does not occur?

Alternatively, it may be possible to select a DB Type that responds a little faster than the DB currently used. What DB Type do you use?

If timeout is not the cause, a different countermeasure must be considered.