apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.9k stars 3.38k forks source link

[C++] Allow fine tuning of memory pool from environment variable(s) #27132

Open asfimport opened 3 years ago

asfimport commented 3 years ago

Both jemalloc and mimalloc allow for some amount of fine tuning from environment variables.  However, in some cases, the settings entered into these environment variables are overridden by arrow.

For example, jemalloc docs describe MALLOC_CONF, an environment variable that can be used to tune any jemalloc parameter.  But, the global variable malloc_conf (which arrow uses) takes precedence.

Mimalloc also exposes some tuning parameters (https://microsoft.github.io/mimalloc/environment.html).  I have not tested which of these work and which do not work so it may be that all of them are accessible already.

Tuning may be useful in certain environments (for example, if someone wanted to use jemalloc over mimalloc in an environment where overcommit is disabled and so they wanted to set the retain:true option) and so it would be good to allow for these tuning parameters to be set at runtime.  One potential solution may be to expose our own environment variable (e.g. ARROW_MALLOC_CONF).

Reporter: Weston Pace / @westonpace

Related issues:

Note: This issue was originally created as ARROW-11228. Please see the migration documentation for further details.

asfimport commented 3 years ago

Antoine Pitrou / @pitrou: We may want to expose backend-specific configuration options instead (ARROW_JEMALLOC_CONF, ARROW_MIMALLOC_CONF...) since one may select a different backend at runtime.