Closed llamahunter closed 10 years ago
The compilation cache server cache several things:
Could you confirm that the used memory grows with each compilation, even when no code is modified at all ?
Could you run the compilation server with "-v" so it prints debug informations ? Usually when you don't change any code you should get only "Reused" lines (no Typing) when recompiling the second time.
Also, you might want to run the compilation (without compilation server) with "--times" and report the times spent. Compilation server will only effectively reduce time spent in Parsing and Typing, not in Macro Execution or Code Generation.
We will look into adding a way to dump the compiler cache content so we can pinpoint why it is so big in your case. This will require you to test with a GIT build of the compiler that can be found on http://build.haxe.org
I worked on compiler cache leak detection. I noticed that the macros modules seems to be able to reach somehow the calling context modules. We have to fix tht first before checking for other things, since then every module that has a dependency to a macro will leak all other modules of the same context.
@llamahunter you can try if you want to get the build 1738a1939f3ce3cda18a7774c3c9b60879dd3183 (or more recent) from http://build.haxe.org, then after several compilations, you can run: haxe --connect <port> --display memory
to have the compilation server dump its cache content on his stdout (which you might prefer to redirect to a log file since it's quite verbose for big projects). Would be interested to know at least the first numbers it tells (Total/Free memory + cache size)
Here's some progress.
The following test file (which includes a simple macro) produces some "leaks". These leaks are defined by modules which can be reached when browsing the memory but are not referenced in the module dependencies.
class Test {
macro static function foo() {
return macro {};
}
public static function main()
{
#if !macro
trace(foo());
#end
}
}
To get the memory dump : haxe --connect 6000 -js test.js -main Test && haxe --connect 6000 --display memory
Here's the dump:
Total Allocated Memory 0 B
Free Memory 0 B
Total cache size 1.7 MB
haxelib 88 B
parsed ast 1.0 MB (81 files stored)
typed modules 995 KB (49 modules stored)
--- CONFIG 1b8af2ea765866a891cbc1fb60d38094 ----------------------------
EnumValue : 512 B
Class : 696 B
Enum : 760 B
haxe.PosInfos : 1 KB
Test : 1 KB
IntIterator : 3 KB
Math : 16 KB
Reflect : 32 KB
Date : 33 KB
HxOverrides : 33 KB
String : 37 KB
Array : 41 KB
js.Boot : 98 KB
Std : 126 KB
Type : 129 KB
haxe.EnumTools : 139 KB
StdTypes : 169 KB
--- CONFIG feed1889bd90491bf4f42a400429f39c ----------------------------
haxe.io.BytesData : 460 B
EnumValue : 512 B
Class : 696 B
Enum : 760 B
haxe.io.Error : 1 KB
StdTypes : 176 KB
neko.Lib : 179 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
neko.Boot : 179 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
neko.NativeString : 181 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.ds.WeakMap : 183 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
Std : 185 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
Test : 186 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
neko.NativeArray : 188 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
StringBuf : 190 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
Math : 194 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
Map : 195 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.macro.Type : 197 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.ds.HashMap : 199 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.ds.StringMap : 200 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.ds.IntMap : 200 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.macro.Expr : 204 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.ds.ObjectMap : 205 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.io.Bytes : 205 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
List : 210 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
String : 220 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
Reflect : 221 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.ds.BalancedTree : 234 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
Array : 248 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
Type : 295 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
haxe.ds.EnumValueMap : 305 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
LEAK Type(feed1889bd90491bf4f42a400429f39c)
haxe.EnumTools : 305 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK IntIterator(feed1889bd90491bf4f42a400429f39c)
IntIterator : 342 KB
LEAK StdTypes(feed1889bd90491bf4f42a400429f39c)
LEAK String(feed1889bd90491bf4f42a400429f39c)
LEAK Array(feed1889bd90491bf4f42a400429f39c)
LEAK Enum(feed1889bd90491bf4f42a400429f39c)
LEAK neko.NativeArray(feed1889bd90491bf4f42a400429f39c)
LEAK Std(feed1889bd90491bf4f42a400429f39c)
LEAK Class(feed1889bd90491bf4f42a400429f39c)
LEAK neko.Boot(feed1889bd90491bf4f42a400429f39c)
LEAK Math(feed1889bd90491bf4f42a400429f39c)
LEAK neko.Lib(feed1889bd90491bf4f42a400429f39c)
LEAK haxe.io.Bytes(feed1889bd90491bf4f42a400429f39c)
LEAK haxe.io.BytesData(feed1889bd90491bf4f42a400429f39c)
LEAK neko.NativeString(feed1889bd90491bf4f42a400429f39c)
LEAK haxe.io.Error(feed1889bd90491bf4f42a400429f39c)
LEAK StringBuf(feed1889bd90491bf4f42a400429f39c)
Cache dump complete
I wonder what's so special about IntIterator
so it becomes reachable by all modules and seems to point to other modules as well (maybe a closure linking to it) in some strange recursive way. The typing context itself should not be reachable since I have a check for a special memory block for it (actually Common context, but it's referenced by Typer context as well)
Could be some central ...
operator?
Ok, found what was going wrong and fixed it. We still have some reported leaks, here's updated log.
Total Allocated Memory 0 B
Free Memory 0 B
Total cache size 1.5 MB
haxelib 88 B
parsed ast 886 KB (81 files stored)
typed modules 863 KB (49 modules stored)
--- CONFIG 1b8af2ea765866a891cbc1fb60d38094 ----------------------------
EnumValue : 548 B
Class : 732 B
Enum : 796 B
haxe.PosInfos : 1 KB
IntIterator : 3 KB
StdTypes : 5 KB
Reflect : 32 KB
Date : 33 KB
HxOverrides : 33 KB
String : 37 KB
Array : 41 KB
Test : 58 KB
LEAK String(1b8af2ea765866a891cbc1fb60d38094)
LEAK Array(1b8af2ea765866a891cbc1fb60d38094)
LEAK HxOverrides(1b8af2ea765866a891cbc1fb60d38094)
LEAK Date(1b8af2ea765866a891cbc1fb60d38094)
Math : 73 KB
LEAK String(1b8af2ea765866a891cbc1fb60d38094)
LEAK Array(1b8af2ea765866a891cbc1fb60d38094)
LEAK HxOverrides(1b8af2ea765866a891cbc1fb60d38094)
LEAK Date(1b8af2ea765866a891cbc1fb60d38094)
js.Boot : 98 KB
Std : 126 KB
Type : 129 KB
haxe.EnumTools : 139 KB
--- CONFIG feed1889bd90491bf4f42a400429f39c ----------------------------
haxe.io.BytesData : 484 B
EnumValue : 548 B
Class : 732 B
Enum : 796 B
haxe.io.Error : 1 KB
neko.NativeString : 2 KB
IntIterator : 3 KB
haxe.ds.WeakMap : 4 KB
StdTypes : 5 KB
Test : 7 KB
neko.NativeArray : 11 KB
Map : 16 KB
haxe.macro.Type : 18 KB
haxe.ds.HashMap : 20 KB
haxe.macro.Expr : 25 KB
neko.Lib : 104 KB
neko.Boot : 104 KB
Std : 110 KB
StringBuf : 116 KB
Math : 119 KB
haxe.ds.StringMap : 126 KB
haxe.ds.IntMap : 126 KB
haxe.ds.ObjectMap : 131 KB
haxe.io.Bytes : 131 KB
List : 135 KB
String : 145 KB
Reflect : 146 KB
haxe.ds.BalancedTree : 160 KB
Array : 174 KB
Type : 220 KB
haxe.ds.EnumValueMap : 230 KB
LEAK Type(feed1889bd90491bf4f42a400429f39c)
haxe.EnumTools : 230 KB
Cache dump complete
Another leak, with a very simple program:
class Test {
public function toSer() {
return Std.instance(this,Test);
}
public static function main() {
}
}
This will make Std (and some others, which depends on Std) leak Test, while it should not
PS : that's when targeting Flash
Here's a new minimal Test:
// Test.hx
class Test {
public function toSer() {
return A.instance(this,Test);
}
public static function main() {
}
}
// A.hx
class A {
public inline static function instance<T>( v : { }, c : Class<T> ) : T {
return B.as(v, c);
}
}
// B.hx
import A;
class B {
public inline static function as<T>( v : Dynamic, c : Class<T> ) : Null<T> {
return untyped __as__(v,c);
}
}
B referencing A and inlining seems mandatory
Actually with only A this way it leaks as well:
class A {
public inline static function instance<T>( c : Class<T> ) : T {
return untyped __as__(null,c);
}
}
Starting from cb9a06adb2449dcf8c421d722c2903424fc7e84e, there is no longer reproducible dependency leaks in my test project. Watiing for feedback from @llamahunter to see if this solves the issue.
Thanks for the quick and thorough investigation! This is excellent progress. I will pull down the change into our tree and see what the memory usage looks like. I will attach a dump after several builds so that you can see what leaks, if any, remain. Did you do anything to address the raw size of the cache? Even without a leak, after one run, it is over a gigabyte. Might that be due to some of the intrinsic leakage between the macro and compilation contexts?
Please report us the detailed size. The "Total Memory" is the the total GC heap size. You might have a lot of "Free Memory" in case of excessive fragmentation. The "cache size" is the amount of memory taken by the context cache. Then the modules are sorted by ascending size. Look if you have some modules which memory usage seems out of place.
So, tried the new compiler, but it still seems to be using an inordinately large amount of memory. I ran --display memory after several zero-change compiles, and came back with ~1800 lines of output, so I will omit the entirety here. However, this was the head and tail of that output, including the last several dozen modules, as they seem to be sorted in ascending size order:
Client connected
Waiting for data...
Reading 86 bytes
Processing Arguments [--cwd,/trees/cpd/tcdui/mainline/srcroot/applib/widgets/widgets/test,--display,memory]
Total Allocated Memory 0 B
Free Memory 0 B
Total cache size 2446.1 MB
haxelib 11 KB
parsed ast 111.2 MB (1859 files stored)
typed modules 2366.1 MB (1783 modules stored)
--- CONFIG 36f225ca50e08868cd9058f0cf9fdc24 ----------------------------
.... many things omitted here with cache size < 1.0MB....
com.tivo.core.trio.Collection : 1.0 MB
debugger.DebuggerThread : 1.1 MB
com.tivo.core.trio.MixSearch : 1.1 MB
com.tivo.applib.util.ResCategoryUtil : 1.3 MB
com.tivo.core.trio.Mix : 1.3 MB
com.tivo.core.trio.EndUserMessageType : 1.4 MB
com.tivo.core.trio.Content : 1.4 MB
com.tivo.iris.widgets.List : 1.4 MB
com.tivo.core.trio.RecordingStore : 1.6 MB
com.tivo.applib.util.AppLibUtilImpl : 1.7 MB
com.tivo.core.trio.ContentSearch : 1.8 MB
com.tivo.core.trio.CollectionSearch : 1.9 MB
com.tivo.core.trio.RecordingSearch : 2.2 MB
com.tivo.core.trio.OfferSearch : 2.4 MB
com.tivo.core.trio.Offer : 2.6 MB
com.tivo.core.trio.CancellationReason : 2.8 MB
com.tivo.core.trio.Recording : 3.0 MB
com.tivo.applib.widgets.PgdImageComponentTest : 6.0 MB
com.tivo.applib.util.MdoUtil : 6.3 MB
com.tivo.applib.widgets.PgdImageInfoFactory : 7.8 MB
com.tivo.core.trio.MindObjectType : 2197.7 MB
--- CONFIG 74c11e6a708a88ff15d6d8dd367ab9fe ----------------------------
haxe.io.BytesData : 944 B
EnumValue : 1000 B
Class : 1 KB
Enum : 1 KB
com.tivo.core.util.IDebugEnvServer : 1 KB
haxe.io.Error : 2 KB
nme.events.EventPhase : 2 KB
com.tivo.core.util.AssertError : 2 KB
haxe.PosInfos : 2 KB
nme.events.IEventDispatcher : 3 KB
neko.NativeString : 4 KB
nme.errors.Error : 4 KB
com.tivo.core.util.AssertionFailedEvent : 4 KB
IntIterator : 5 KB
StdTypes : 8 KB
haxe.ds.WeakMap : 8 KB
nme.events.TextEvent : 10 KB
nme.events.ErrorEvent : 10 KB
com.tivo.core.util.AssertionFailedEventDispatcher : 12 KB
com.tivo.core.util.AssertTracker : 14 KB
neko.NativeArray : 15 KB
haxe.EnumTools : 23 KB
StringBuf : 23 KB
haxe.ds.EnumValueMap : 24 KB
haxe.ds.IntMap : 26 KB
haxe.ds.HashMap : 26 KB
haxe.ds.StringMap : 27 KB
Map : 31 KB
haxe.macro.Type : 34 KB
nme.events.Event : 36 KB
haxe.ds.ObjectMap : 36 KB
Std : 38 KB
haxe.io.Bytes : 40 KB
haxe.CallStack : 44 KB
haxe.macro.Expr : 48 KB
Lambda : 50 KB
List : 55 KB
StringTools : 57 KB
haxe.xml.Fast : 60 KB
Math : 63 KB
neko.Lib : 65 KB
Date : 65 KB
Reflect : 68 KB
neko.Boot : 70 KB
String : 73 KB
haxe.macro.Context : 91 KB
haxe.ds.BalancedTree : 95 KB
Type : 112 KB
Array : 136 KB
Xml : 173 KB
haxe.macro.ExprTools : 307 KB
haxe.macro.Printer : 372 KB
com.tivo.core.util.DebugEnv : 503 KB
com.tivo.core.util.Asserts : 662 KB
Cache dump complete
Cached 0 modules
Stats = 0 files, 0 classes, 0 methods, 0 macros
Time spent : 41.900s
The memory usage, as measured by the OS, still seems to increase from run to run, with zero changes to the source code. The process table reports haxe is currently using 3.57 GB after about 6 iterations, when it started at roughly 2.6GB after the first compile. I'm doing this investigation on my 64 bit Mac, which is why I am able to make any progress. However, this level of memory use is incompatible with our linux development environment, which is 32 bit.
One thing that does stand out as strange is the one module which claims to have a 2+ GB cache all for itself: com.tivo.core.trio.MindObjectType. That is a funny file. It's got an enormous enum in it, followed by an enormous switch statement for every value in the enum, and a map that maps every value in the enum to a specially formatted string. I question whether we (TiVo) actually need this file. Still, it seems to be exploding the cache.
Thanks, I think this pinpoint the issue quite closely.
@Simn There must be a bug somewhere in our switch rewriting rules that creates incremental/exponential changes on the AST that grows the memory.
@llamahunter Do you think you could send us this particular file so we can easily reproduce ? Or at least try to reproduce by writing a similar file that does show incremental module memory usage after numerous compilation.
I don't think it could be related to that. I've tested the compiler with hxparse again which is basically one giant switch, yet the HaxeParser remains steady at 1.4/1.5 MB cache usage.
I can probably send you this file, with the usual caveats about not posting it on the Internet
Richard
On Nov 16, 2013, at 12:59 AM, Nicolas Cannasse notifications@github.com wrote:
Thanks, I think this pinpoint the issue quite closely.
@Simn There must be a bug somewhere in our switch rewriting rules that creates incremental/exponential changes on the AST that grows the memory.
@llamahunter Do you think you could send us this particular file so we can easily reproduce ? Or at least try to reproduce by writing a similar file that does show incremental module memory usage after numerous compilation.
— Reply to this email directly or view it on GitHub.
@llamahunter thanks, please send it to me it will be kept private. We just need a way to reproduce the bug. Could you also confirm that each recompilation makes this module memory grow according to ---display memory
? Thanks
Strangely, it does not appear that memory is growing via '--display memory', but the OS process table says that it is. Maybe there is a leak somewhere lower down in ocaml? Or maybe it's a 'heisen-leak' because of some infrequently run GC task? Still, I've seen the process table reported memory usage range from 2.8 to 5+ GB, typically hovering somewhere around 3.9 GB.
@llamahunter thanks for the file, we could track down the issue which was related to some cloning of the enum structure when accessing the fields. Of course with a ~2900 constructors enum cloned 2900 times (one for each case), that was growing quite exponentialy. Please update to 181bf6a9c0cfbae27fbe4a449f738b98e28f576e
Don't update just yet, there's some regression with XML generation.
https://github.com/HaxeFoundation/haxe/commit/4eec384097c7bcf14a4d8a15df0b44673bacbd19 should be good to go.
Ok! Order of magnitude reduction in total reported cache memory usage, and 200x reduction in that on funny enum file.
Total cache size now reported as 262.0 MB, however OS process table still reports 1.37 GB of memory usage by compile server after several zero change runs. As this is less than 2.0GB, it should probably work for our 32-bit development environment, but still, seems large.
Hmm.. seems that as I watch the OS reported memory usage, I see it vary between 591 MB and 1.49 GB. Maybe it's just a lazy GC issue? Always seems to end up at 1.37GB after it completes a compile and --display memory cycle, tho.
Currently the GC is hardcoded to do a full compaction every 10 runs. We may want to make that configurable as it probably makes sense to compact more often when compiling huge projects.
--display memory
will do a full major GC cycle followed by a compaction before printing its memory results. So what you see in the display memory results is more likely the "resident memory" (mainly the total cache size).
While you compile however it will require intermediate memory usage. You're right that if it requires 1.37GB that's quite a lot already. We haven't made much memory profiling of the compiler in the past, it's actually a bit hard due to the non linear nature of the typer and the lack of native Ocaml memory profiler.
However we're lucky that allocating a lot of memory takes time, so we could maybe build a profilable haxe compiler for Linux (with -p
ocamlopt argument) and then use gprof
to analyse the output from your builds without the need for the actual code (only the gprof.mon monitoring results)
And example is shown in the "Profiling" section here http://ocaml.org/tutorials/performance_and_profiling.html
Since we have a fairly large project, compilation times are an issue for us, especially due to the fact that Haxe does not support precompiled libraries. However, when using the caching compilation server, we typically see situations where it is consuming more than 6GB of memory. On some system, we only have a 32-bit OS, meaning that no process can easily use more than 2GB of memory (and definitely no more than 4GB).
It seems unlikely that the compilation server really needs to hold on to 6GB of AST, given that our source tree is, at the worst case, about 9MB of source code.