acouvreur / ssh-log-to-influx

Send SSH authentication logs to influxdb with geohashing IP
GNU General Public License v3.0
101 stars 25 forks source link

Ram usage so big that Grafana won't load #156

Closed williambcra closed 3 years ago

williambcra commented 3 years ago

Hello! 👋

I've been having fun with this for something like a week, and I've noticed that, because I have an insane amount of attack per day (I'm at more than 30k in a few days), the Influx database must have become too big (Haven't checked).

The thing is, because of that, Influx is taking pretty much all my memory on my dedicated server and I'm now unable to access Grafana (I know, I only have 2GB of ram, and if there is no solution I'll unfortunately have to shut it down, but if it's fixable I would love to keep it!)

Influx memory usage

This is the kind of errors I get when I manage to get to Grafana (I also got a 503 from influx one time)

Grafana error

I tried to revert the docker-compose to version "2.4" so I could limit the memory available in the docker-compose but then Grafana wasn't able to retrieve any data from influx...

If I remember correctly, Grafana cannot connect because Influx times-out.

I know it's not really an issue, because I'm pretty sure it's just because I lack of available memory, but I would gladly take any tips for a fix, or else I'm afraid I will have to drop my ssh monitoring 😭

acouvreur commented 3 years ago

Hi, I've had no problem with influxdb on a Raspberry Pi 3, which has 1Gb of RAM.

I tried to revert the docker-compose to version "2.4" so I could limit the memory available in the docker-compose but then Grafana wasn't able to retrieve any data from influx...

Memory limitation is out of the box, wich means it will kill the container if it needs more RAM than granted. The influxdb daemon itself is not aware of this limitation.

I know it's not really an issue, because I'm pretty sure it's just because I lack of available memory, but I would gladly take any tips for a fix, or else I'm afraid I will have to drop my ssh monitoring 😭

For further investigation you should probably take a look into Influx documentation and troubleshooting.

Let me know if you have any improvement !

williambcra commented 3 years ago

Memory limitation is out of the box, wich means it will kill the container if it needs more RAM than granted. The influxdb daemon itself is not aware of this limitation.

Well on my side it does not kill anything and if I leave it running, looks like it never reaches that point, but my server just gets super slow (it takes 1-3 minutes to log in SSH, even typing in the terminal takes ages, and there's an huge input lag)

Do you have any logs from the influx container that may seems strange ?

I've got some errors in the server logs like those:

[2020-12-27T16:33:29.540] [INFO] default - TCP Server is running on port 7070.
[2020-12-27T16:33:31.937] [ERROR] default - connect ECONNREFUSED 172.21.0.4:8086
[2020-12-27T16:34:07.032] [INFO] default - CONNECTED: ::ffff:172.21.0.1:60492
Disconnected from authenticating user root ********* port **** [preauth]
[2020-12-27T16:34:34.007] [INFO] default - CONNECTED: ::ffff:172.21.0.1:60578
Invalid user maintain from ********* port ****
Invalid user huawei from ********* port ****
Invalid user silvia from ********* port ****
Invalid user toor from ********* port ****
[2020-12-27T16:37:04.832] [ERROR] default - No data retrieved, cannot continue
Invalid user chandra from ********* port ****
 Invalid user kali from********* port ****
 Invalid user admin from ********* port ****
 Invalid user pedro from ********* port ****
 Invalid user test2 from ********* port ****
 Invalid user elena from ********* port ****
Invalid user shubham from ********* port ****
Disconnected from authenticating user root ********* port **** [preauth]
Invalid user emily from ********* port ****
[2020-12-27T16:37:48.514] [ERROR] default - No data retrieved, cannot continue
Disconnected from authenticating user root ********* port **** [preauth]
Disconnected from authenticating user root ********* port **** [preauth]
Invalid user user from ********* port ****
Invalid user www from ********* port ****
(node:1) UnhandledPromiseRejectionWarning: Error: Internal Server Error
    at once (/app/node_modules/influx/lib/src/pool.js:243:49)
    at ClientRequest.<anonymous> (/app/node_modules/influx/lib/src/pool.js:66:13)
    at Object.onceWrapper (events.js:286:20)
    at ClientRequest.emit (events.js:203:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:565:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:111:17)
    at Socket.socketOnData (_http_client.js:451:20)
    at Socket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)
(node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
(node:1) UnhandledPromiseRejectionWarning: Error: Internal Server Error
    at once (/app/node_modules/influx/lib/src/pool.js:243:49)
    at ClientRequest.<anonymous> (/app/node_modules/influx/lib/src/pool.js:66:13)
    at Object.onceWrapper (events.js:286:20)
    at ClientRequest.emit (events.js:203:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:565:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:111:17)
    at Socket.socketOnData (_http_client.js:451:20)
    at Socket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 4)
Invalid user mcserver from ********* port ****
(node:1) UnhandledPromiseRejectionWarning: Error: No host available
    at Pool.stream (/app/node_modules/influx/lib/src/pool.js:230:29)
    at Promise (/app/node_modules/influx/lib/src/pool.js:166:18)
    at new Promise (<anonymous>)
    at Pool.discard (/app/node_modules/influx/lib/src/pool.js:165:16)
    at InfluxDB.writePoints (/app/node_modules/influx/lib/src/index.js:859:27)
    at Socket.socket.on (/app/dist/index.js:71:14)
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 5)
(node:1) UnhandledPromiseRejectionWarning: Error: Internal Server Error
    at once (/app/node_modules/influx/lib/src/pool.js:243:49)
    at ClientRequest.<anonymous> (/app/node_modules/influx/lib/src/pool.js:66:13)
    at Object.onceWrapper (events.js:286:20)
    at ClientRequest.emit (events.js:203:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:565:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:111:17)
    at Socket.socketOnData (_http_client.js:451:20)
    at Socket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 6)
Disconnected from authenticating user root ********* port **** [preauth]
(node:1) UnhandledPromiseRejectionWarning: Error: Internal Server Error
    at once (/app/node_modules/influx/lib/src/pool.js:243:49)
    at ClientRequest.<anonymous> (/app/node_modules/influx/lib/src/pool.js:66:13)
    at Object.onceWrapper (events.js:286:20)
    at ClientRequest.emit (events.js:203:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:565:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:111:17)
    at Socket.socketOnData (_http_client.js:451:20)
    at Socket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 7)
(node:1) UnhandledPromiseRejectionWarning: Error: Internal Server Error
    at once (/app/node_modules/influx/lib/src/pool.js:243:49)
    at ClientRequest.<anonymous> (/app/node_modules/influx/lib/src/pool.js:66:13)
    at Object.onceWrapper (events.js:286:20)
    at ClientRequest.emit (events.js:203:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:565:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:111:17)
    at Socket.socketOnData (_http_client.js:451:20)
    at Socket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 8)

From the influx container I have stuff like that about the timouts (probably when I load the page and it tries to load the widgets):

[httpd] ******* - **** [27/Dec/2020:16:37:49 +0000] "POST /write?db=telegraf&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 500 20 "-" "-" db99548a-4861-11eb-800e-0242ac150004 11226436
ts=2020-12-27T16:38:01.071432Z lvl=error msg="[500] - \"timeout\"" log_id=0RLGbXi0000 service=httpd
[httpd] ******* - **** [27/Dec/2020:16:37:58 +0000] "POST /write?db=telegraf&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 500 20 "-" "-" e0e8d199-4861-11eb-8012-0242ac150004 10000704
ts=2020-12-27T16:38:08.732184Z lvl=error msg="[500] - \"timeout\"" log_id=0RLGbXi0000 service=httpd
[httpd] ******* - **** [27/Dec/2020:16:37:59 +0000] "POST /write?db=telegraf&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 500 20 "-" "-" e14c03fa-4861-11eb-8014-0242ac150004 10000775
ts=2020-12-27T16:38:09.382347Z lvl=error msg="[500] - \"timeout\"" log_id=0RLGbXi0000 service=httpd
[httpd] ******* - **** [27/Dec/2020:16:38:06 +0000] "POST /write?db=telegraf&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 500 20 "-" "-" e5861d83-4861-11eb-8015-0242ac150004 10078885
ts=2020-12-27T16:38:16.644645Z lvl=error msg="[500] - \"timeout\"" log_id=0RLGbXi0000 service=httpd
[httpd] ******* - **** [27/Dec/2020:16:38:14 +0000] "POST /write?db=telegraf&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 500 20 "-" "-" ea9071af-4861-11eb-8016-0242ac150004 10066042
ts=2020-12-27T16:38:24.995571Z lvl=error msg="[500] - \"timeout\"" log_id=0RLGbXi0000 service=httpd

What is the influxdb version ?

X-Influxdb-Version: 1.6.4

Maybe I'm doing something bad when running the tool? I usually just launch the standalone docker compose in deamon and that's all... Thanks for answering that fast!

williambcra commented 3 years ago

Opened htop while trying to load my grafana page, I'm pretty sure it comes from Influx now...

image

The only widget that manages to load is this one image

How is the memory usage on your raspi?

acouvreur commented 3 years ago

you definitely have an issue with your influx instance. Do you use an SSD ? For time series databases you need a high speed device for most cases.

You're having 500 timeout on /write which is on writing datapoints requests. Maybe you have an issue with you're storage drive ?

williambcra commented 3 years ago

Do you use an SSD?

Unfortunately not, it's a pretty cheap server. But as you said that it was working on your Raspberry I'm kind of lost... I'm going to see if I can extend the timeout so it has time to write the Data, that might fix the issue...

(sorry for the late reply, lots of stuff to do right now 😞 )