crowdsecurity / crowdsec

CrowdSec - the open-source and participative security solution offering crowdsourced protection against malicious IPs and access to the most advanced real-world CTI.
https://crowdsec.net
MIT License
9.12k stars 472 forks source link

Allow set startTime and endTime in CloudWatch(streaming mode) datasource #1953

Open lucgiang-novobi opened 1 year ago

lucgiang-novobi commented 1 year ago

Add time range limitation in CloudWatch log stream datasource

/kind enhancement When CrowdSec start reading from CloudWatch logs stream, it read entire log events(from earliest events). It can make duplicated events when we restart a CrowdSec container with same configuration. Should we have startTime and endTime parameters in GetLogEventsPagesWithContext

  err := cw.cwClient.GetLogEventsPagesWithContext(ctx,
      &cloudwatchlogs.GetLogEventsInput{
          Limit:         aws.Int64(cfg.GetLogEventsPagesLimit),
          LogGroupName:  aws.String(cfg.GroupName),
          LogStreamName: aws.String(cfg.StreamName),
          NextToken:     startFrom,
          StartFromHead: aws.Bool(true),
      },

Why is this needed?

To avoid reading entirely large CloudWatch log stream when using CloudWatch datasource.

github-actions[bot] commented 1 year ago

@lucgiang-novobi: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details I am a bot created to help the [crowdsecurity](https://github.com/crowdsecurity) developers manage community feedback and contributions. You can check out my [manifest file](https://github.com/crowdsecurity/crowdsec/blob/master/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
github-actions[bot] commented 1 year ago

@lucgiang-novobi: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.

Details I am a bot created to help the [crowdsecurity](https://github.com/crowdsecurity) developers manage community feedback and contributions. You can check out my [manifest file](https://github.com/crowdsecurity/crowdsec/blob/master/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
lucgiang-novobi commented 1 year ago

Other solution, could we store Logs Stream reading token into file, and load this file before starting acquisition modules. CW stream reading token variable

    streamIndexMutex.Lock()
    v := cw.streamIndexes[cfg.GroupName+"+"+cfg.StreamName] # store this variable into file
    streamIndexMutex.Unlock()
buixor commented 1 year ago

Hello @lucgiang-novobi !

Thanks for the report. Storing a token right now wouldn't be easy on the agent (no other datasource does it). Would "simply" not starting from head every time do the trick? (ie. setting StartFromHead to false ?)

lucgiang-novobi commented 1 year ago

Hello @lucgiang-novobi !

Thanks for the report. Storing a token right now wouldn't be easy on the agent (no other datasource does it). Would "simply" not starting from head every time do the trick? (ie. setting StartFromHead to false ?)

Yes, we need a configuration parameter to decide whether CrowdSec reads logs from the first event or last specific duration (i.e 30 minutes before). Reading the entirety of the log events at the beginning is not effective because Crowdsec should work with log events in real-time (or near real-time). cloudwatch-log-stream-problem

lucgiang-novobi commented 1 year ago

Hi @buixor , Do you have any solution for this issue? Please let me know. Thank you!