Open ollieglass opened 7 years ago
I had a look at code_gen.py. Perhaps the CodeGenerator class could build a string instead of opening and writing to a file. When the .file method is called, it could write to a file, close it and return the name.
On Linux and macOS you have to do simply issue ulimit -n 2048
. By design compiling trees consumes 2 * n_trees + 2
open files.
On Windows there is no way to raise the limit globally, but there is an internal solution, which you have to include in your script:
import platform
if platform.system() == 'Windows':
import win32file
win32file._setmaxstdio(2048)
I used to write one cpp file, but it didn't work for large forests - especially if you have lots of data and allow for full growth. For my example this translate to 500 .cpp files over 100MB (50GB+ of RAM). Keeping all those files in StringIO's would probably work, although .o files would also still be there, so we would go down to ntrees + 2
open files (assuming we successfully close/delete files after compiling them to .o).
To sum up - I regard it as not an issue, and overcoming it would probably cost a lot of RAM in return, which ultimately is a deal-breaker (at least for me).
I see what you mean. I've fixed the problem for myself, like you say, it isn't hard.
I am concerned that users could be put off by this. How about an informative error for them, like this?
class CodeGenerator(object):
def __init__(self):
try:
self._file = tempfile.NamedTemporaryFile(prefix='compiledtrees_', suffix='.cpp', delete=True)
except OSError as e:
if e.errno == 24:
print("Too many open files. Increase limit to 2 * n_trees + 2" \
+ "(unix / mac: ulimit -n [limit], windows: http://bit.ly/2fAKnz0)", file=sys.stderr)
raise e
self._indent = 0
edit: added if
That might be good solution if e.errno == 24
across platforms. As I remember correctly, on Windows I've got some kind of "Permission Denied" errors, which were terrible to debug...
Although I fear we will catch some false positives.
Also an unittest for that would be usefull (see hints on changing limits on all platforms)
Here's a loop that fits and compiles trees, stepping up the number of estimators each time:
It crashes on 140:
This is on mac OS.
I haven't looked into workarounds - perhaps I can increase the number of files that can be open at once. But if there's a way to limit the open files in the library, that would probably be better.