delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.03k stars 365 forks source link

Getting error message when running in lambda: message: "Too many open files" #2353

Closed ion-elgreco closed 3 months ago

ion-elgreco commented 4 months ago

Discussed in https://github.com/delta-io/delta-rs/discussions/2352

Originally posted by **rob3000** March 28, 2024 Hi' i'm currently running the deltra-rs rust library in lambda to do a merge query into delta lake. However i noticed i'm getting the following error: ``` Error: hyper::Error(Connect, ConnectError("tcp open error", Os { code: 24, kind: Uncategorized, message: "Too many open files" })) ``` It looks like its coming from the DynamoDBLock; ``` { "timestamp": "Mar 28 09:06:04.847", "level": "ERROR", "fields": { "message": "dynamodb client failed to write log entry: GenericDynamoDb { source: HttpDispatch(HttpDispatchError { message: \"Error during dispatch: error trying to connect: dns error: Too many open files (os error 24)\" }) }" }, "target": "deltalake_aws::logstore" } ``` I suspect the logstore is trying to create new connections when running through a batch for records to save to deltalake, I'm still quite new to rust, but when i normally talk to dynamoDB i create a a reference outside of the handler and pass the reference through, do i need to look at doing the same with the logstore? And if so how?
cmackenzie1 commented 3 months ago

There probably isn't much delta-rs can do here. The Too many open files error is an operating system error. Most operating systems set a limit as to how many open file descriptors one can have opened at once - and each network connection also counts as an open file descriptor.

On macOS and Debian this can be adjusted using ulimit command or by editing /etc/sysctl.conf.

macOS

# see your current limit
ulimit -n

# change the limit
ulimit -n 30000
ion-elgreco commented 3 months ago

@rob3000 fyi