Closed 0xdabbad00 closed 6 years ago
Tracing this back up in the logs, it looks like everything is running smoothly with the lambda, and then it tries to download an S3 object and errors with this:
[Errno 24] Too many open files: SSLError
Traceback (most recent call last):
File "/var/task/stream_alert/rule_processor/main.py", line 39, in handler
StreamAlert(context).run(event)
File "/var/task/stream_alert/rule_processor/handler.py", line 148, in run
self._process_alerts(payload)
File "/var/task/stream_alert/rule_processor/handler.py", line 356, in _process_alerts
for record in payload.pre_parse():
File "/var/task/stream_alert/rule_processor/payload.py", line 155, in pre_parse
s3_file = self._get_object()
File "/var/task/stream_alert/rule_processor/payload.py", line 245, in _get_object
return self._download_object(region, bucket, key)
File "/var/task/stream_alert/rule_processor/payload.py", line 217, in _download_object
client.download_fileobj(bucket, key, data)
File "/var/runtime/boto3/s3/inject.py", line 570, in download_fileobj
return future.result()
File "/var/runtime/s3transfer/futures.py", line 73, in result
return self._coordinator.result()
File "/var/runtime/s3transfer/futures.py", line 233, in result
raise self._exception
SSLError: [Errno 24] Too many open files
Then it just keeps erroring with that OSError
from the initial issue comment, for a few hundred times.
Thanks for reporting @0xdabbad00 !!
I'm digging into this to find the root cause. I'll update this as I figure things out, but I suspect the issue might be isolated to my environment.
hey @0xdabbad00, I'm fairly certain I know what the issue is. Taking a look at this code:
Those open
calls are not using context managers, which is leaving the file handle open indefinitely (for the entire life of the container). If a container is being reused a ton of times, this could result in the error you're seeing. I was planning a fix similar to:
if extension == '.gz':
with gzip.open(s3_object, 'r') as s3_file:
for num, line in enumerate(s3_file, start=1):
yield num, line.rstrip()
else:
with open(s3_object, 'r') as s3_file:
for num, line in enumerate(s3_file, start=1):
yield num, line.rstrip()
# AWS Lambda apparently does not reallocate disk space when files are
# removed using os.remove(), so we must truncate them before removal
with open(s3_object, 'w'):
pass
You're welcome to make these changes and put a PR up, since I'm not in a state to do so right now. If you have other thoughts let me know!
Unfortunately, this did not resolve the issue. Still digging in.
I had also tried to see what files are open by adding the code:
import psutil
for proc in psutil.process_iter():
try:
if "python" in str(proc.exe()):
LOGGER.info(proc.exe())
LOGGER.info(proc.open_files())
except:
pass
Unfortunately, this only printed []
for the open_files()
call, which I think means Lambda may be doing something fishy that inhibits my ability to debug.
We disabled our firehose configuration and are no longer experiencing these errors. I believe these errors came about at the time that I had enabled the firehose which was led me to attempt that. Therefore, I believe something to do with the firehose configuration is holding open resource handles. We'll continue investigating.
We were behind on some commits and it looks like this resolved our problem: https://github.com/airbnb/streamalert/blob/347c64b2e77df22b44f6caae334f9748e3060cea/stream_alert/rule_processor/firehose.py#L292
we've seen this internally an can confirm this is not fixed
When running in the debug mode, the call to
df -h /tmp
will be run from https://github.com/airbnb/streamalert/blob/cf12209aa071cd4b2f85bfc5b77446bbb696d451/stream_alert/rule_processor/payload.py#L207This may result in an exception being thrown:
This will only present itself if debug output is on, and this may be masking another issue for myself, but perhaps this case should checked somehow (where you may have too many files in /tmp?). I also don't know why my /tmp would have too many files in it, or if I should clear that out somehow. Based on lambda limits, the /tmp directory should be limited to 512MB anyway ( https://docs.aws.amazon.com/lambda/latest/dg/limits.html ), and the max file descriptors should be set to 1024.