ait-aecid / logdata-anomaly-miner

This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis with limited resources and lowest possible permissions to make it suitable for production server use.
GNU General Public License v3.0
63 stars 22 forks source link

stop_learning_time does not work #1323

Open landauermax opened 1 month ago

landauermax commented 1 month ago

Original issue: https://github.com/ait-aecid/logdata-anomaly-miner/issues/831 Fix: https://github.com/ait-aecid/logdata-anomaly-miner/issues/1230

Now this feature is not working anymore.

Set learning to stop after 1 hour:

LearnMode: True

LogResourceList:
        - 'file:///home/landauerm/test.log'

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'

Input:
        timestamp_paths: "/accesslog/time"

Analysis:
        - type: "NewMatchPathValueDetector"
          paths: ["/accesslog/status"]
          output_logline: True
          stop_learning_time: 3600

EventHandlers:
        - id: "stpe"
          type: "StreamPrinterEventHandler"

For the first two lines, learning should be active, then switched off:

127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"
127.0.0.1 - - [17/May/2021:13:23:48 +0000] "GET / HTTP/1.1" 300 11229 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
127.0.0.1 - - [17/May/2021:13:23:55 +0000] "GET / HTTP/1.1" 300 11229 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3 GTB5"
127.0.0.1 - - [17/May/2021:14:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

However, learning is never switched off (300 should be detected twice as new value):

2021-05-17 13:25:14 New path(es) detected
NewMatchPathDetector: "DefaultNewMatchPathDetector" (1 lines)
  /accesslog: 127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
  /accesslog/host: 127.0.0.1
  /accesslog/sp0:
  /accesslog/ident: -
  /accesslog/sp1:
  /accesslog/user: -
  /accesslog/sp2:
  /accesslog/time: 1621250714
  /accesslog/sp3: ] "
  /accesslog/fm/request: GET / HTTP/1.1
    /accesslog/fm/request/method: 0
    /accesslog/fm/request/sp5:
    /accesslog/fm/request/request: /
    /accesslog/fm/request/sp6:
    /accesslog/fm/request/version: HTTP/1.1
  /accesslog/sp6: "
  /accesslog/status: 200
  /accesslog/sp7:
  /accesslog/size: 11229
  /accesslog/combined:  "-" "Wget/1.20.3 (linux-gnu)"
    /accesslog/combined/combined:  "-" "Wget/1.20.3 (linux-gnu)"
      /accesslog/combined/combined/sp9:  "
      /accesslog/combined/combined/referer: -
      /accesslog/combined/combined/sp10: " "
      /accesslog/combined/combined/user_agent: Wget/1.20.3 (linux-gnu)
      /accesslog/combined/combined/sp11: "
['/accesslog', '/accesslog/host', '/accesslog/sp0', '/accesslog/ident', '/accesslog/sp1', '/accesslog/user', '/accesslog/sp2', '/accesslog/time', '/accesslog/sp3', '/accesslog/fm/request', '/accesslog/sp6', '/accesslog/status', '/accesslog/sp7', '/accesslog/size', '/accesslog/combined', '/accesslog/combined/combined', '/accesslog/combined/combined/sp9', '/accesslog/combined/combined/referer', '/accesslog/combined/combined/sp10', '/accesslog/combined/combined/user_agent', '/accesslog/combined/combined/sp11', '/accesslog/fm/request/method', '/accesslog/fm/request/sp5', '/accesslog/fm/request/request', '/accesslog/fm/request/sp6', '/accesslog/fm/request/version']
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 13:25:14 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 200}
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 14:21:16 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 404}
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 15:23:48 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 300}
127.0.0.1 - - [17/May/2021:13:23:48 +0000] "GET / HTTP/1.1" 300 11229 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"

^Caminer: caught signal, shutting down
aminer: caught signal, shutting down
ernstleierzopf commented 1 month ago

could you please extend the test by having a line with a new status code (for example as the last one)? The output seems to be fine, because you should get an anomaly for the third line. To verify if there really is a bug, please also post the output of the persisted values for the NMPVD - if that file only contains values for the status codes 200 and 404, then there is no bug.

landauermax commented 1 month ago

I added a line with status code 999:

2021-05-17 13:25:14 New path(es) detected
NewMatchPathDetector: "DefaultNewMatchPathDetector" (1 lines)
  /accesslog: 127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
  /accesslog/host: 127.0.0.1
  /accesslog/sp0:
  /accesslog/ident: -
  /accesslog/sp1:
  /accesslog/user: -
  /accesslog/sp2:
  /accesslog/time: 1621250714
  /accesslog/sp3: ] "
  /accesslog/fm/request: GET / HTTP/1.1
    /accesslog/fm/request/method: 0
    /accesslog/fm/request/sp5:
    /accesslog/fm/request/request: /
    /accesslog/fm/request/sp6:
    /accesslog/fm/request/version: HTTP/1.1
  /accesslog/sp6: "
  /accesslog/status: 200
  /accesslog/sp7:
  /accesslog/size: 11229
  /accesslog/combined:  "-" "Wget/1.20.3 (linux-gnu)"
    /accesslog/combined/combined:  "-" "Wget/1.20.3 (linux-gnu)"
      /accesslog/combined/combined/sp9:  "
      /accesslog/combined/combined/referer: -
      /accesslog/combined/combined/sp10: " "
      /accesslog/combined/combined/user_agent: Wget/1.20.3 (linux-gnu)
      /accesslog/combined/combined/sp11: "
['/accesslog', '/accesslog/host', '/accesslog/sp0', '/accesslog/ident', '/accesslog/sp1', '/accesslog/user', '/accesslog/sp2', '/accesslog/time', '/accesslog/sp3', '/accesslog/fm/request', '/accesslog/sp6', '/accesslog/status', '/accesslog/sp7', '/accesslog/size', '/accesslog/combined', '/accesslog/combined/combined', '/accesslog/combined/combined/sp9', '/accesslog/combined/combined/referer', '/accesslog/combined/combined/sp10', '/accesslog/combined/combined/user_agent', '/accesslog/combined/combined/sp11', '/accesslog/fm/request/method', '/accesslog/fm/request/sp5', '/accesslog/fm/request/request', '/accesslog/fm/request/sp6', '/accesslog/fm/request/version']
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 13:25:14 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 200}
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 14:21:16 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 404}
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 15:23:48 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 300}
127.0.0.1 - - [17/May/2021:13:23:48 +0000] "GET / HTTP/1.1" 300 11229 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"

2021-05-17 16:21:16 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 999}
127.0.0.1 - - [17/May/2021:14:21:16 +0000] "GET /doesntexist HTTP/1.1" 999 488 "-" "Wget/1.20.3 (linux-gnu)"

Persisted values:

cat /var/lib/aminer/NewMatchPathValueDetector/Default
[200, 300, 404, 999]
ernstleierzopf commented 1 month ago

The issue is that stop_learning_time does not work on historical data, because it is defined in InputInterfaces.AtomHandlerInterface as self.stop_learning_timestamp = time.time() + stop_learning_time for each analysis component.

The issue why this can not be defined within each analysis component is that the learning should stop exactly after 3600 seconds, which would only work for the analysis component that has seen the first log atom.

So the feature works as intended, because we do not have any data at the initialization stage of the aminer.

The same bug also happens for the next_persist_time of each analysis component.

To at least provide a solution for this issue, I introduce the AminerStartTimestamp config property. This timestamp also needs to be in the same timezone as the data (by default UTC and in this example also UTC). If it is None, the value is set to time.time() (same as it is right now) The solution is not great, but it can be worked with.

Output after fix:

sudo aminer -c config.yml -C -o
/etc/aminer/conf-enabled/cerberus/__init__.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import get_distribution, DistributionNotFound
/etc/aminer/conf-enabled/cerberus/__init__.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import get_distribution, DistributionNotFound
2021-05-17 13:25:14 New path(es) detected
NewMatchPathDetector: "DefaultNewMatchPathDetector" (1 lines)
  /accesslog: 127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
  /accesslog/host: 127.0.0.1
  /accesslog/sp0:  
  /accesslog/ident: -
  /accesslog/sp1:  
  /accesslog/user: -
  /accesslog/sp2:  
  /accesslog/time: 1621250714
  /accesslog/sp3: ] "
  /accesslog/fm/request: GET / HTTP/1.1
    /accesslog/fm/request/method: 0
    /accesslog/fm/request/sp5:  
    /accesslog/fm/request/request: /
    /accesslog/fm/request/sp6:  
    /accesslog/fm/request/version: HTTP/1.1
  /accesslog/sp6: " 
  /accesslog/status: 200
  /accesslog/sp7:  
  /accesslog/size: 11229
  /accesslog/combined:  "-" "Wget/1.20.3 (linux-gnu)"
    /accesslog/combined/combined:  "-" "Wget/1.20.3 (linux-gnu)"
      /accesslog/combined/combined/sp9:  "
      /accesslog/combined/combined/referer: -
      /accesslog/combined/combined/sp10: " "
      /accesslog/combined/combined/user_agent: Wget/1.20.3 (linux-gnu)
      /accesslog/combined/combined/sp11: "
['/accesslog', '/accesslog/host', '/accesslog/sp0', '/accesslog/ident', '/accesslog/sp1', '/accesslog/user', '/accesslog/sp2', '/accesslog/time', '/accesslog/sp3', '/accesslog/fm/request', '/accesslog/sp6', '/accesslog/status', '/accesslog/sp7', '/accesslog/size', '/accesslog/combined', '/accesslog/combined/combined', '/accesslog/combined/combined/sp9', '/accesslog/combined/combined/referer', '/accesslog/combined/combined/sp10', '/accesslog/combined/combined/user_agent', '/accesslog/combined/combined/sp11', '/accesslog/fm/request/method', '/accesslog/fm/request/sp5', '/accesslog/fm/request/request', '/accesslog/fm/request/sp6', '/accesslog/fm/request/version']
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 13:25:14 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 200}
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 14:21:16 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 404}
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 15:23:48 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 300}
127.0.0.1 - - [17/May/2021:13:23:48 +0000] "GET / HTTP/1.1" 300 11229 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"

2021-05-17 15:23:55 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 300}
127.0.0.1 - - [17/May/2021:13:23:55 +0000] "GET / HTTP/1.1" 300 11229 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3 GTB5"

2021-05-17 16:21:16 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 999}
127.0.0.1 - - [17/May/2021:14:21:16 +0000] "GET /doesntexist HTTP/1.1" 999 488 "-" "Wget/1.20.3 (linux-gnu)"
sudo cat /var/lib/aminer/log/aminer.log
22/May/2024:23:34:14 +0200 INFO aminer started.
22/May/2024:23:34:14 +0200 DEBUG Called add_time_triggered_component for the component AnalysisChild
22/May/2024:23:34:14 +0200 DEBUG Started with build_analysis_pipeline.
22/May/2024:23:34:14 +0200 DEBUG Registered component SubhandlerFilter with the id 0 and component_name 'AtomFilter'.
22/May/2024:23:34:14 +0200 ERROR [Errno 2] No such file or directory: b'NewMatchPathDetector/Default'
22/May/2024:23:34:14 +0200 DEBUG Called add_time_triggered_component for the component NewMatchPathDetector
22/May/2024:23:34:14 +0200 DEBUG Registered component NewMatchPathDetector with the id 1 and component_name 'DefaultNewMatchPathDetector'.
22/May/2024:23:34:14 +0200 ERROR [Errno 2] No such file or directory: b'NewMatchPathValueDetector/Default'
22/May/2024:23:34:14 +0200 DEBUG Called add_time_triggered_component for the component NewMatchPathValueDetector
22/May/2024:23:34:14 +0200 DEBUG Registered component NewMatchPathValueDetector with the id 2 and component_name 'NewMatchPathValueDetector2'.
22/May/2024:23:34:14 +0200 ERROR [Errno 2] No such file or directory: b'AnalysisChild/RepositioningData'
22/May/2024:23:34:14 +0200 DEBUG Repositioning data was persisted.
22/May/2024:23:34:14 +0200 DEBUG NewMatchPathDetector persisted data.
22/May/2024:23:34:14 +0200 DEBUG NewMatchPathValueDetector persisted data.
22/May/2024:23:34:14 +0200 DEBUG Statistics logs are written..
22/May/2024:23:34:14 +0200 INFO Stopping learning in the NewMatchPathValueDetector.
22/May/2024:23:34:14 +0200 DEBUG NewMatchPathDetector persisted data.
22/May/2024:23:34:14 +0200 DEBUG NewMatchPathValueDetector persisted data.
sudo cat /var/lib/aminer/NewMatchPathValueDetector/Default
[200, 404]
landauermax commented 1 month ago

I do not understand why time.time() is used - as discussed in issue https://github.com/ait-aecid/logdata-anomaly-miner/issues/1230 the stop_learning_time should be based on the time from the logs, and not the real time. Also, I think it is fine if the stop_learning_time is set using the first log atom that is processed by any of the analysis components.

ernstleierzopf commented 1 month ago

Sorry, I don't really know what happened in #1230 - this was a year ago. We both just know that it is not working in the desired way at the moment.

Please help me clarifying the desired functionality: do you want to have one time at which all of the anomaly detectors stop (for example first seen log atom + 3600 seconds) or should each anomaly detector have it's own time when to stop (this means that one detector might stop after a hour and another detector might stop after 2 days). I was working towards the former one, because the latter one seems too complicated to follow in the real world. Because the detectors do not have access to other detectors, we needs to define the stopping time beforehand and do not have any timestamp from log data. That's why I would define the mentioned AminerStartTimestamp config property.

If we implement the second example where the stop time is really managed by the detector, this is no issue for me (besides a lot of refactoring).

landauermax commented 1 month ago

If we use the AminerStartTimestamp, we could just as well set the time where the AMiner switches the training mode in absolute timestamps. However, we introduced the stop_learning_time because we did not want to make it necessary to state an absolute timestamp when the AMiner should switch off the training mode - especially when the config is reused on several data sets, this will just be a big source of error.

I would pursue the second case, since this is how it was originally implemented and how the functionality worked after https://github.com/ait-aecid/logdata-anomaly-miner/issues/1230. Since each detector receives every log line (unless one starts using atom filters), it is usually the case that all detectors will switch learn mode at the same time as they all receive the same first line.