Closed corepointer closed 3 months ago
Don't know if that matmul accuracy test crashing has something to do with these changes. On my laptop there were two failing tests ("cartesian, success" and "where, success") and on my scale up box there were no failing tests :man_shrugging:
test/api/cli/sql/cartesian_success_1.daphne
and test/api/cli/sql/where_success_1.daphne
both fail in src/compiler/inference/InferencePass.cpp:218
where numCols is 3 and numRows is -5
Thx for the detailed review and the suggestions @pdamme
{
"comment": "DAPHNE runtime logs",
"name": "runtime",
"level": "DEBUG",
"filename": "",
"format": "%^[%n %L]:%$ %v"
},
I implemented the requested changes. But it seems that 2) won't work and explains why I went that addStrRef() route the other night: We can only delete dynamically allocated memory of course. What we get when deleting constants that the compiler emits is:
free(): invalid pointer
Signal: SIGABRT (Aborted)
So if we really want to delete what is allocated at compile time, we need a way to distinguish that from the run-time data and destruct correctly. MatrixConstants, besides strings, would also be something where this is needed.
Thanks for continuing with this PR, @corepointer. You are absolutely right, we cannot free the string literals due to the way we allocate them in LowerToLLVMPass
, I didn't think about that. There are multiple potential ways to fix this problem. I extended your implementation with one more commit and I think the memory leak is fixed now.
My idea was to always increase the reference counter of a string literal right after its creation. The effect is that the original object is never freed by DAPHNE's garbage collection. However, the memory consumption clearly indicates that MLIR frees it automatically, but I didn't double-check that in the IR. All strings created by kernels (e.g., concat) are properly freed by DAPHNE's garbage collection. At the same time, string literals defined inside UDFs or loop bodies can be used by many function calls or loop iterations (this would have been a challenge with an alternative solution, see below). To my mind, this is a simple trick and we could even apply it to matrix literals later (currently, we unnecessarily copy those for every use). I've implemented this idea on top of your code and pushed a commit.
So essentially, the memory leak is fixed, as far as I can tell. I manually tested it with the following DaphneDSL script:
def foo(i:si64) {
print("veeery long string" + i); # I used a 10 KB string
}
for(i in 1:100000) {
foo(i);
}
During the execution of daphne
I watched its memory consumption. On main, it goes up to around 1 GB (10 KB * 100k loop iterations), while with the code in this PR, it stays constant at a few MB or so.
Before that, I considered another alternative: We could easily ensure that string literals are backed by strings that were allocated dynamically in C++ at compile-time and can be freed. To this end, we just need to adapt LowerToLLVMPass.cpp
: In ConstantOpLowering::matchAndRewrite()
:
if(auto strAttr = op.getValue().dyn_cast<StringAttr>()) {
StringRef sr = strAttr.getValue();
#if 0
// MLIR does not have direct support for strings. Thus, if this is
// ...
#elif 1
Type i8PtrType = LLVM::LLVMPointerType::get(
IntegerType::get(rewriter.getContext(), 8)
);
char * newString = new char[sr.size() + 1];
memcpy(newString, sr.str().c_str(), sr.size() + 1);
uint64_t ptr = reinterpret_cast<uint64_t>(reinterpret_cast<const void *>(newString));
rewriter.replaceOpWithNewOp<LLVM::IntToPtrOp>(
op.getOperation(),
i8PtrType,
rewriter.create<arith::ConstantOp>(
loc,
rewriter.getI64IntegerAttr(ptr)
)
);
#elif 0
// ...
#endif
}
That works, but then we get the same problem we used to have with matrix literals: When a string literal is in a UDF or a loop's body, it gets freed after the first function call or loop iteration and is not there anymore for the second time, leading to a crash. For matrix literals, we solved this by always copying the (small) matrix and leaving the original untouched. I think a similar approach can be done for strings, too. However, such artificial copying is not nice.
Thx again. Deleting the branch after successful merge :+1:
The code in this PR solves the memory leak documented in #788.
In addition three other minor memory leaks are closed - see the corresponding commits.
Assigning @pdamme as requested =)