awslabs / cdk-serverless-clamscan

Apache License 2.0
234 stars 66 forks source link

Feature Request: Filtering Options for Scanning (Tags, Extensions, Paths, Size) #1205

Open pgagnidze opened 3 months ago

pgagnidze commented 3 months ago

Improve the cdk-serverless-clamscan construct with a filter property for scanning S3 objects based on tags, file extensions, S3 paths, and object size. Additionally, introduce configurable logic for both overall filtering criteria and tag-specific filtering, allowing different filters per bucket. These filters should also be configurable when dynamically adding buckets using the addSourceBucket method.

Proposed filter Property

The filter property will be an object applied per bucket, with the following sections:

  1. Tags: Check if the object is tagged with specific key-value pairs, with a configurable logic operator to determine matching criteria.
  2. File Extensions: Specific file types to scan.
  3. S3 Paths: Targeted S3 prefixes or paths.
  4. Object Size: Conditions to scan objects larger or smaller than specified sizes.
  5. Logic Operator: Defines the overall logic to combine the specified filters (default: ALL).

Configuration Example

Here’s an organized example showing the filter property per bucket:

Example:

new ServerlessClamscan(this, 'rClamscan', {
  buckets: [
    {
      bucket: bucket_1,
      filter: {
        tags: {
          criteria: { 
            "ScanRequired": "true",
            "Priority": "high"
          },
          logicOperator: 'ANY' // Can be 'ANY' or 'ALL' (default: ANY)
        },
        extensions: ['.mp4', '.jpeg', '.png'],
        paths: ['uploads/images/', 'uploads/videos/'],
        objectSize: {
          greaterThanBytes: 1024, // 1 KB, optional
          lessThanBytes: 10485760 // 10 MB, optional
        },
        logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
      }
    },
    {
      bucket: bucket_2,
      filter: {
        extensions: ['.exe', '.zip'],
        logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
      }
    }
  ]
});

// Adding a source bucket with filters dynamically
const sc = new ServerlessClamscan(this, 'rClamscan', { /* initial configuration */ });
sc.addSourceBucket(bucket_3, {
  filter: {
    tags: {
      criteria: { 
        "ScanRequired": "true"
      },
      logicOperator: 'ANY' // Can be 'ANY' or 'ALL' (default: ANY)
    },
    extensions: ['.docx', '.pdf'],
    paths: ['uploads/documents/'],
    objectSize: {
      lessThanBytes: 5242880 // 5 MB, optional
    },
    logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
  }
});

Scanning Behavior

This feature maintains backward compatibility by ensuring that if no filter is specified, all objects are scanned.

Benefits

Looking forward to your feedback and thank you for considering this feature request!

pgagnidze commented 3 months ago

The addEventNotification method on the bucket already supports prefix and suffix filters, which can be used for S3 path and file extension filtering. This setup ensures that the Lambda function is triggered only for relevant objects. The Lambda function can then handle additional checks for object size and tags.

dontirun commented 3 months ago

I like the idea. A few initial comments

pgagnidze commented 2 months ago

FYI https://aws.amazon.com/blogs/aws/introducing-amazon-guardduty-malware-protection-for-amazon-s3/