dimitriv / Ziria

A domain-specific-language and compiler for low-level bitstream processing.
92 stars 18 forks source link

wpl_alloca() inside vectorization loops leads to memory leaks #119

Open dimitriv opened 8 years ago

dimitriv commented 8 years ago

Compiling tests/backend struct03.wpl with --vectorize leads to an out-of-memory exception. Indeed the struct is large but inspecting the C code we see that allocation happens inside the loop, instead of somewhere globally. As a result we quickly run out of memory. Fix should be easy to do somewhere in CgTypes.hs

ghost commented 8 years ago

I'll take a look at this.

dimitriv commented 8 years ago

To help you see what is going on, arrays are allocated at the moment in two ways: (a) if they are very big OR their size is unknown at code-generation time (i.e. we are generating the code for the body of a length-polymorphic function) then they go through the wpl_alloca() path. This emulates a big stack for every function, which is finally popped all the way up (since Ziria can only return by-value, and never assign pointers -- with the caveat being that other weird bug you discovered a couple of days ago @siddhanathan :-)). However, what is happening here is that the wpl_alloca happens to be /inside/ a very large loop so before even getting the chance to deallocate memory, we run out of it. (b) if arrays are small and statically known then they are just declared locally in the current scope (but, again, we should be very careful that we don't just return their addresses when they go out of scope) . All this logic exists in CgTypes.hs which provides a lot of functionality for such things. I am not entirely sure actually that the fix is straightforward: I can think of several things: (1) execute the loop body using "inAllocFrame" which will ensure that we push/pop in every iteration -- this may be the simplest thing but introduces a slight perf penaly; i'd give this a try first. (2) Somehow hoist big allocations out of loops (seems harder as we have to inspect deeply all blocks of code, find declarations etc.).

dimitriv commented 8 years ago

(The actual test file is struct3.wpl, not struct03.wpl I think)

ghost commented 8 years ago

There is at least one naive fix, and that is to abuse DeclPkg by doing the assignment during the declaration itself. So instead of DeclPkg ig [stmt] we could have DeclPkg ig [] and do wpl_alloca inside ig itself.

ghost commented 8 years ago

Eh.... I didn't mean to link those commits here.

@dimitriv would https://github.com/siddhanathan/Ziria/commit/cd3a61d57b963fc8af15f0e86f62b27c09e103ec work? It essentially does the wpl_alloca at the declaration scope, ensuring that the wpl_alloca doesn't happen inside a loop. I know all tests pass. I tried other variations including pushing/popping on every iteration, and they either have bad performance or cause tests to fail.