Open code-memento opened 4 years ago
I've seen the same behaviour since last Friday (3rd July).
I don't know if it's related or not but on that same day I started having trouble when building the Docker container, with this error:
Trying other mirror.
One of the configured repositories failed (Extra Packages for Enterprise Linux 7 - x86_64),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=epel ...
4. Disable the repository permanently, so yum won't use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable epel
or
subscription-manager repos --disable=epel
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=epel.skip_if_unavailable=true
failure: repodata/repomd.xml from epel: [Errno 256] No more mirrors to try.
Something has changed last week and I can't work out what, why or how to fix it so that everything starts working again.
Hi @mogusbi
Indeed, something has changed, in my case in the execution of the Lambda.
Your issue is related to the build phase, like it cannot pull epel.
Maybe you should use another repository.
@code-memento Sorry, I should have made myself clearer - the issue pulling down epel is intermittent so it will eventually work after a few retries. I only mentioned it as I started seeing it on the same day I then subsequently started seeing problems with my Lambda
When it does eventually build and deploy, I then see the same issue as you with the lambda function timing out when I try to scan a file
Hi @mogusbi
Okey, so we're on the same boat 😆 .
In my case, we did not change the lambda zip. It used to work like charm up until a week or so. When I cleaned the _clamav_defs_ bucket it seemed to work for a moment but started to timeout again. Even if the timeout is 15min, it hangs till the end.
If the Lambda did not change, and the defs are not the cause, is it related to the AWS runtime :sweat_smile: ?
It could well be, the Amazon Linux OS was updated 8 days ago: https://hub.docker.com/_/amazonlinux?tab=tags
Although the release notes say it was updated last month: https://aws.amazon.com/amazon-linux-2/release-notes/
@mogusbi Do you think building with the latest amazonlinux image could solve this issue ? I think it's more related to the runtime. Moreover, I think that the lambda behaves differently when the lambda container is reused (in this case the defs are not downoaded) did you notice anything about this ?
It hasn't fixed the problem for me
@code-memento yes, it looks like the issue only appears on a cold start. Subsequent requests to scan work fine once the Lambda is warm
@code-memento I've upped the memory of my functions from 1024
to 2048
and that appears to have fixed the issue (for now)
It seems to work : The lambda needs approx 1290Mb, I'll do more tests to make sure that all the cases are covered. Thanks @mogusbi
I did many tests, it seems to do the trick, no problems so far. Thanks @mogusbi
That's good to hear!
I'm still slightly concerned as to why it all of a sudden needs more memory, it would be good to get to the bottom of that as throwing more memory at it is treating the symptom but not the disease
You can say that again ! The only explanation that I found is that the clamav_defs has been updated. Thus, the lambda needs more resources for the scan.
I found this error in the update lambda, maybe it's related :
b"ClamAV update process started at Fri Jul 10 08:32:26 2020\ndaily database available for update (local version: 25863, remote version: 25868)\nERROR: buildcld: Can't add daily.hsb to new daily.cld - please check if there is enough disk space available\nERROR: buildcld: gzclose() failed for /tmp/clamav_defs/tmp.6bd07/clamav-e2595ffff6f8a72f6094fc40802f8921.tmp\nERROR: updatedb: Incremental update failed. Failed to build CLD.\nERROR: Unexpected error when attempting to update database: daily\nWARNING: fc_update_databases: fc_update_database failed: Failed to update database (14)\nERROR: Database update process failed: Failed to update database (14)\nERROR: Update failed.\n"
@code-memento - you're right. the definition-update lambda is failing with this error and it is impacting the scan. did you find a fix for this:
'b"ClamAV update process started at Fri Jul 10 08:32:26 2020\ndaily database available for update (local version: 25863, remote version: 25868)\nERROR: buildcld: Can't add daily.hsb to new daily.cld - please check if there is enough disk space available\nERROR: buildcld: gzclose() failed for /tmp/clamav_defs/tmp.6bd07/clamav-e2595ffff6f8a72f6094fc40802f8921.tmp\nERROR: updatedb: Incremental update failed. Failed to build CLD.\nERROR: Unexpected error when attempting to update database: daily\nWARNING: fc_update_databases: fc_update_database failed: Failed to update database (14)\nERROR: Database update process failed: Failed to update database (14)\nERROR: Update failed.\n"
@Muthuveerappanv The error disappears if you delete clamav_defs. AFAIK the /tmp folder is limited to 512MB.
@Muthuveerappanv The error disappears if you delete clamav_defs. AFAIK the /tmp folder is limited to 512MB.
u mean delete the clamav_defs on the definition s3 bucket?
@Muthuveerappanv yes the definition bucket
Just wondering if there are alternative solutions other than deleting clamav_defs
in the s3 bucket?
I would also like to know this, I dived deep into trying to figure this out last weekend.
I read somewhere, that somebody mentioned putting the definitions into memory directly after download to free up /tmp but I have no idea how to do this.
@culshaw I don't see how it can be done, as the Clamav scan is by the end a command line execution with different parameters. @DimitrijeManic It's just a speculation, the true issue is that the scan needs more memory (> 1024MB) as the code didn't change for all of us, I suspect that it might be caused by the defs
@code-memento I have set my lambda to 2048 but I believe the issue comes from the hard limit in the /tmp dir.
Possible solutions?
Thoughts?
This can be reproduced by adding a volume with a size limit in scripts/run-update-lambda
#! /usr/bin/env bash
set -eu -o pipefail
#
# Run the update.lambda_handler locally in a docker container
#
rm -rf tmp/
unzip -qq -d ./tmp build/lambda.zip
NAME="antivirus-update"
# Simulate /tmp/ dir with a 512m size restriction
docker volume create --driver local --opt type=tmpfs --opt device=tmpfs --opt o=size=512m,uid=496 clamav_defs
docker run --rm \
-v "$(pwd)/tmp/:/var/task" \
-v clamav_defs:/tmp \
-e AV_DEFINITION_PATH \
-e AV_DEFINITION_S3_BUCKET \
-e AV_DEFINITION_S3_PREFIX \
-e AWS_ACCESS_KEY_ID \
-e AWS_DEFAULT_REGION \
-e AWS_REGION \
-e AWS_SECRET_ACCESS_KEY \
-e AWS_SESSION_TOKEN \
-e CLAMAVLIB_PATH \
--memory="${MEM}" \
--memory-swap="${MEM}" \
--cpus="${CPUS}" \
--name="${NAME}" \
lambci/lambda:python3.7 update.lambda_handler
hack workaround to not download existing clamav defs in update.py
# -*- coding: utf-8 -*-
# Upside Travel, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import boto3
import clamav
from common import AV_DEFINITION_PATH
from common import AV_DEFINITION_S3_BUCKET
from common import AV_DEFINITION_S3_PREFIX
from common import CLAMAVLIB_PATH
from common import get_timestamp
import shutil
def lambda_handler(event, context):
s3 = boto3.resource("s3")
s3_client = boto3.client("s3")
print("Script starting at %s\n" % (get_timestamp()))
for root, dirs, files in os.walk(AV_DEFINITION_PATH):
for f in files:
os.unlink(os.path.join(root, f))
for d in dirs:
shutil.rmtree(os.path.join(root, d))
to_download = clamav.update_defs_from_s3(
s3_client, AV_DEFINITION_S3_BUCKET, AV_DEFINITION_S3_PREFIX
)
print("Skipping clamav definition download %s\n" % (get_timestamp()))
# for download in to_download.values():
# s3_path = download["s3_path"]
# local_path = download["local_path"]
# print("Downloading definition file %s from s3://%s" % (local_path, s3_path))
# s3.Bucket(AV_DEFINITION_S3_BUCKET).download_file(s3_path, local_path)
# print("Downloading definition file %s complete!" % (local_path))
clamav.update_defs_from_freshclam(AV_DEFINITION_PATH, CLAMAVLIB_PATH)
# If main.cvd gets updated (very rare), we will need to force freshclam
# to download the compressed version to keep file sizes down.
# The existence of main.cud is the trigger to know this has happened.
if os.path.exists(os.path.join(AV_DEFINITION_PATH, "main.cud")):
os.remove(os.path.join(AV_DEFINITION_PATH, "main.cud"))
if os.path.exists(os.path.join(AV_DEFINITION_PATH, "main.cvd")):
os.remove(os.path.join(AV_DEFINITION_PATH, "main.cvd"))
clamav.update_defs_from_freshclam(AV_DEFINITION_PATH, CLAMAVLIB_PATH)
clamav.upload_defs_to_s3(
s3_client, AV_DEFINITION_S3_BUCKET, AV_DEFINITION_S3_PREFIX, AV_DEFINITION_PATH
)
print("Script finished at %s\n" % get_timestamp())
@DimitrijeManic does this solution fix the lambda timeout issue ?
Have seen similar issue recently. Try this: File size: Less than 1MB Case 1: Lambda MEM: 1024 MB, Timeout: 10 minutes, Result: Timeout after 10 minutes. Case 2: Lambda MEM: 2048MB, Timeout: 3 minutes, Result: Succeed after 21 seconds with 1299MB MEM used. So suggest use 2048MB instead, can reduce lambda timeout significantly
@wangcarlton after some digging, it seems that clamscan is well known memory beast. The recent issues are without doubt caused by the increase of the number of virus definition.
Increasing lambda MEM to 2048 has resolved the timeout issue however the next next problem is regarding disk space in /tmp
.
The lambda will complete successfully but this error message will be in the logs
ClamAV update process started at Fri Jul 10 08:32:26 2020
daily database available for update (local version: 25863, remote version: 25868)
ERROR: buildcld: Can't add daily.hsb to new daily.cld - please check if there is enough disk space available
ERROR: buildcld: gzclose() failed for /tmp/clamav_defs/tmp.6bd07/clamav-e2595ffff6f8a72f6094fc40802f8921.tmp
ERROR: updatedb: Incremental update failed. Failed to build CLD.
ERROR: Unexpected error when attempting to update database: daily
WARNING: fc_update_databases: fc_update_database failed: Failed to update database (14)
ERROR: Database update process failed: Failed to update database (14)
ERROR: Update failed.
So maybe this issue is resolved and we can continue the discussion in https://github.com/upsidetravel/bucket-antivirus-function/issues/128 ?
I am using this: https://github.com/upsidetravel/bucket-antivirus-function I guess this is new issue comes after upgrade from 0.102.2 to 0.102.3. I was trying to solve it today, but seems use other directory(such as /var/task) is prohibited by AWS. Lambda has a fixed 500MB storage which can't be changed https://aws.amazon.com/lambda/faqs/
Q: What if I need scratch space on disk for my AWS Lambda function?
Each Lambda function receives 500MB of non-persistent disk space in its own /tmp directory.
It also took me this whole afternoon to figure out that some libs(such as libprelude etc.) need to be installed, env path needs to be updated when run freshclam after the upgrade from 0.102.2 to 0.102.3. I am going to migrate the lambda to an EC2(more stable and under control) to update definition file.
thanks @DimitrijeManic , your snippet fixed my update.py issues (running out of space)
Increasing the memory worked for me
Same here, increasing the memory did the trick.
Increasing lambda MEM to 2048 has resolved the timeout issue however the next next problem is regarding disk space in
/tmp
.Lambda has a fixed 500MB storage which can't be changed
Guys, I know, it's a long time ago you wrote this. I just want to mention, that there is the possibility to append an EFS (Elastic File System) to a Lambda, an then you have nearly unlimited storage available.
just make sure to avoid this error: https://github.com/aws/serverless-application-model/issues/1631#issuecomment-648049879
and note, you have to delete the files by yourself after scanning
The best solution for this problem is to increase the memory to 2048MB. Thanks folks.
Hi,
We have some weird behavior, for a year or more the lambda functions worked without any issues.
Up until recently, the scan takes like forever and is stopped by the lambda timeout.
Any ideas ?
Thanks and regards