awslabs / cdk-serverless-clamscan

Apache License 2.0
237 stars 67 forks source link

scanning large file failed. No such file or directory: '7za' #1130

Closed vincentelbrachtbtc closed 7 months ago

vincentelbrachtbtc commented 7 months ago

I need to scan a relatively large files (6-60GB) after being uploaded to s3. ClamScan adds the "IN PROGRESS" tag to the s3 object successfully.

then, the scan Lambda throws the following error:

[ERROR] FileNotFoundError: [Errno 2] No such file or directory: '7za'
Traceback (most recent call last):
  File "/var/lang/lib/python3.11/site-packages/aws_lambda_powertools/metrics/provider/base.py", line 205, in decorate
    response = lambda_handler(event, context, *args, **kwargs)
  File "/var/lang/lib/python3.11/site-packages/aws_lambda_powertools/logging/logger.py", line 449, in decorate
    return lambda_handler(event, context, *args, **kwargs)
  File "/var/task/lambda.py", line 80, in lambda_handler
    expand_if_large_archive(
  File "/var/task/lambda.py", line 164, in expand_if_large_archive
    archive_summary = subprocess.run(
  File "/var/lang/lib/python3.11/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/var/lang/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/var/lang/lib/python3.11/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)

The files that I am trying to scan are .mbtile files The scan runs successfully on .mbtile files smaller than the threshold of 4GB.

My package.json:

"devDependencies": {
    "@types/dot-object": "^2.1.6",
    "@types/jest": "^29.5.11",
    "@types/node": "^18",
    "@typescript-eslint/eslint-plugin": "^6",
    "@typescript-eslint/parser": "^6",
    "aws-cdk": "2.115.0",
    "esbuild": "^0.19.10",
    "eslint": "^8",
    "eslint-import-resolver-typescript": "^3.6.1",
    "eslint-plugin-import": "^2.29.1",
    "jest": "^29.7.0",
    "jest-junit": "^15",
    "projen": "^0.78.0",
    "ts-jest": "^29.1.1",
    "ts-node": "^10.9.2",
    "typescript": "^5.3.3"
  },
  "dependencies": {
    "@aws-quickstart/eks-blueprints": "1.13.1",
    "aws-cdk-lib": "2.115.0",
    "cdk-serverless-clamscan": "^2.6.116",
    "constructs": "^10.0.5",
    "dot-object": "^2.1.4",
    "source-map-support": "^0.5.13",
    "ts-deepmerge": "^6.2.0"
  },

This is how i implemented the Clamscan:

const bucket = new cdk.aws_s3.Bucket(this, '...', {
      bucketName: '...',
      encryption: cdk.aws_s3.BucketEncryption.KMS,
      encryptionKey: props.kmsKey,
      versioned: true,
      removalPolicy: cdk.RemovalPolicy.RETAIN,
      enforceSSL: true,
    });
const sc = new ServerlessClamscan(this, 'rClamscan', {});
sc.addSourceBucket(bucket);
dontirun commented 7 months ago

While this is indeed a bug, this solution won't support file sizes that are that large for a few reasons.

  1. ClamAV only supports file sizes up to 4Gb

  2. The Lambda function that performs the scan will likely time out on files that are larger than 2.5Gb

vincentelbrachtbtc commented 7 months ago

Ist there a planned fix for this bug? Is this solution able to scan large files in chunks?

dontirun commented 7 months ago

The 7za needs to be fixed. However I don't think this solution will work for your use case even when the bug is addressed given the limitations of what I mentioned in my previous comment