Open will-lauer opened 3 years ago
https://github.com/apache/druid/pull/11559 is a proposed solution for this problem. The code is currently undergoing testing in one of our clusters to ensure it adequately addresses the problem, and any additional fixes will be appended to that PR.
With the proposed fix, we now see file descriptor usage on our historical nodes increate by 4 over the course of a query and then return to its base state. The prior behavior was to increase by 100k file descriptors and and not decrease until GC cleaned them up later.
It looks like there are some large cases where this approach doesn't work. SpillingGrouper
uses CloseableIterators.mergeSorted()
to produce a sorted sequence in several cases. Unfortunately, this still requires opening all the spill files to examine their contents, which still triggers the "too many files" condition.
@will-lauer did you try to increase maximum number of opened file descriptors on system level?
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.
I think it's worth reopening this issue. Even if you increase maximum number of opened file descriptors on system level, opening many tmp files can caused your historical to OOM. We have seen where 100k+ of MappingIterator, SmilePraser, etc are created causing our Historical to go OOM.
Affected Version
0.21, but probably all versions prior that support GroupBy v2
Description
We are regularly seeing "Too Many Open Files" errors when running GroupBy queries using GroupBy v2 combined with sketches on some of our larger backend historical nodes. A typical stack trace looks like
When this error occurs, it typically causes a cascade of similar "Too many open file" errors from HDFS and ZK sockets, as all operations in the process become constrained by the available file descriptors.
Configuration
Debugging
We dug into this and found several things going on that contributed to the final problem:
SpillingGrouper.read(Iterator)
andSpillingGrouper.iterator(boolean)
. The code that actually writes the files inSpillingGrouper.spill(Iterator)
uses a resource-try mechanism to ensure that the spilled file is closed immediately after writing it. Instead of using a similar mechanism inread()
, SpillingGrouper simply opens files and builds a series of MappingIterators and then uses them to construct an overall iterator over the complete results.Proposed solution
SpillingGrouper
needs to be changed to open files one at a time, only when it is ready to read from them, and then close them immediately afterwards. We can do this by changingSpillingGrouper.read()
to return aProvider<Iterator>
rather than aMappingIterator
. TheProvider
would be provided a lambda at creation that would construct and open the file at the time it is needed rather than opening the file up front. ThisProvider
could then be used in a newLazyCloseableIterator
to retrieve the underlying iterator (and thus opening the associated file) only when it is actually read to consume the files contents.I'll attach an implementation of the proposed fix shortly.