dparse: Fix memory corruption

CyberShadow commented 2 years ago

Following up to the DLF meeting yesterday. I don't really know much about libdparse's memory handling philosophy to know if this is an idiomatic fix, but it does definitely fix https://github.com/dlang-community/libdparse/pull/387#commitcomment-44638309 by addressing the root cause (GC pointers in non-GC memory).

An alternative approach would be to add debug checks which walk every type added to one of the stdx.allocator-based constructs, and assert at runtime that all pointers within the value don't point into GC-owned memory; and, of course, actually moving all data to allocator-owned memory (which in this case probably means interning the doc string).

In this case, the GC-owned memory was allocated here:

``` #0 0x00007ffff7dfb9d4 in core.internal.gc.impl.conservative.gc.Gcx.alloc(ulong, ref ulong, uint, const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #1 0x00007ffff7df9723 in core.internal.gc.impl.conservative.gc.ConservativeGC.mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #2 0x00007ffff7e005f2 in core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC.mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #3 0x00007ffff7df97c7 in core.internal.gc.impl.conservative.gc.ConservativeGC.qalloc(ulong, uint, scope const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #4 0x00007ffff7df8d79 in gc_qalloc () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #5 0x00007ffff7be9ff8 in std.array.Appender!(immutable(char)[]).Appender.ensureAddable(ulong).__lambda9() () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #6 0x00007ffff7be9f28 in std.array.Appender!(immutable(char)[]).Appender.ensureAddable(ulong) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #7 0x00007ffff7be9d55 in std.array.Appender!(immutable(char)[]).Appender.reserve(ulong) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97 #8 0x00000000009ed5ba in dparse.trivia.MultiLineCommentHelper!(immutable(char)).MultiLineCommentHelper.process!(std.array.Appender!(immutable(char)[]).Appender).process(ref std.array.Appender!(immutable(char)[]).Appender) (this=..., outbuffer=...) at libdparse/src/dparse/trivia.d:469 #9 0x00000000009eaa1e in dparse.trivia.unDecorateComment!(std.array.Appender!(immutable(char)[]).Appender).unDecorateComment(immutable(char)[], ref std.array.Appender!(immutable(char)[]).Appender) (outputRange=..., comment=...) at libdparse/src/dparse/trivia.d:108 #10 0x00000000009ea744 in dparse.trivia.extractDdocFromTrivia!(immutable(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[]).extractDdocFromTrivia(immutable(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[]) (tokens=...) at libdparse/src/dparse/trivia.d:531 #11 0x00000000009db7b2 in dparse.trivia.extractLeadingDdoc(const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)) (token=...) at libdparse/src/dparse/trivia.d:562 #12 0x00000000008a6a8d in std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure.__mixin9.comment() const (this=...) at libdparse/src/dparse/lexer.d:155 #13 0x00000000009f9d3c in dparse.parser.Parser.parseDeclaration(bool, bool, bool) (this=0x7ffff74630b0, inTemplateDeclaration=false, mustBeDeclaration=true, strict=true) at libdparse/src/dparse/parser.d:2135 #14 0x0000000000a05c4c in dparse.parser.Parser.parseModule() (this=0x7ffff74630b0) at libdparse/src/dparse/parser.d:4847 #15 0x0000000000a51ecc in dparse.parser.parseModule!().parseModule(dparse.parser.ParserConfig) (parserConfig=...) at libdparse/src/dparse/parser.d:71 #16 0x0000000000a51df6 in dparse.parser.parseModule!(void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate).parseModule(const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], immutable(char)[], dparse.rollback_allocator.RollbackAllocator*, void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, messageFuncOrDg=..., allocator=0x7fffffffb8a0, fileName=..., tokens=...) at libdparse/src/dparse/parser.d:103 #17 0x0000000000b88112 in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, ref dparse.lexer.StringCache, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, dlgMessage=..., tokens=..., cache=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:273 #18 0x0000000000b8821a in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, immutable(char)[], ref dparse.lexer.StringCache, bool, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, tokens=..., report=false, cache=..., errorFormat=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:286 #19 0x0000000000b87e76 in dscanner.analysis.run.analyze(immutable(char)[][], const(dscanner.analysis.config.StaticAnalysisConfig), immutable(char)[], ref dparse.lexer.StringCache, ref dsymbol.modulecache.ModuleCache, bool) (staticAnalyze=true, moduleCache=..., cache=..., errorFormat=..., config=..., fileNames=...) at src/dscanner/analysis/run.d:243 #20 0x0000000000c010ee in D main (args=...) at src/dscanner/main.d:263 ```

And the last point where it was still seen by the GC was here:

``` #1 0x0000000000a443b7 in dparse.parser.Parser.parseDeclaration(bool, bool, bool) (this=0x7ffff74630b0, inTemplateDeclaration=false, mustBeDeclaration=true, strict=true) at libdparse/src/dparse/parser.d:2138 #2 0x0000000000a4f4b8 in dparse.parser.Parser.parseModule() (this=0x7ffff74630b0) at libdparse/src/dparse/parser.d:4847 #3 0x0000000000aad06c in dparse.parser.parseModule!().parseModule(dparse.parser.ParserConfig) (parserConfig=...) at libdparse/src/dparse/parser.d:71 #4 0x0000000000aacf72 in dparse.parser.parseModule!(void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate).parseModule(const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], immutable(char)[], dparse.rollback_allocator.RollbackAllocator*, void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, messageFuncOrDg=..., allocator=0x7fffffffb8a0, fileName=..., tokens=...) at libdparse/src/dparse/parser.d:103 #5 0x0000000000c18b52 in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, ref dparse.lexer.StringCache, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, dlgMessage=..., tokens=..., cache=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:273 #6 0x0000000000c18c7e in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, immutable(char)[], ref dparse.lexer.StringCache, bool, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, tokens=..., report=false, cache=..., errorFormat=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:286 #7 0x0000000000c18892 in dscanner.analysis.run.analyze(immutable(char)[][], const(dscanner.analysis.config.StaticAnalysisConfig), immutable(char)[], ref dparse.lexer.StringCache, ref dsymbol.modulecache.ModuleCache, bool) (staticAnalyze=true, moduleCache=..., cache=..., errorFormat=..., config=..., fileNames=...) at src/dscanner/analysis/run.d:243 #8 0x0000000000cb407c in D main (args=...) at src/dscanner/main.d:263 ```

For the record, here is the strategy I used to pinpoint the problem:

Disable ASLR (determinism)
Disable GC threads (more determinism)
Fast builds: disable optimization in Phobos/Druntime, build only shared libraries, link dscanner with shared library
Find the address of the lost memory, i.e. the buffer that ends up containing bad UTF8 (add writeln before the code that throws UTFException)
Add GC hook to trap when the address is allocated and when it's freed
- I did this by adding code to the stub LeakDetector. This is still currently a bit more complicated than necessary, will upstream some fixes soon
- For the trap I used asm{int 3;} and ran in GDB when I needed a stack trace.
Examine the lifetime of all values that share the problematic address

In this case, it was:
1. std.array.appender buffer is allocated
2. std.array.appender buffer is wrongly freed
3. An associative array internal structure is allocated
4. (the associative array clobbers the buffer memory - this is where the UTF error is introduced)
5. The associative array internal structure is freed (not that it matters for us)
6. The appender buffer is accessed, causing the UTF error
Bisect the point in the program's execution where the value is last visible in the GC.

The general strategy is that you want to call GC.collect as frequently as possible (though not that frequently that the program now takes forever to run). I did this in three steps:
1. Coarse: every allocation (by adding fullcollect() to Gcx.alloc)
2. Fine: every function call (by stubbing then adding a call counter to the trace hooks in rt.trace, and calling GC.collect() when the call counter is within the bisected certain range)
3. Very fine: by manually adding GC.collect() calls in Dscanner's code in functions identified by the above step.

We definitely would benefit from better tools to debug such problems; though, in this case, an educated guess might have sufficed.

rikkimax commented 2 years ago

We definitely would benefit from better tools to debug such problems; though, in this case, an educated guess might have sufficed.

Based upon your description of the debugging, if we could get the GC to free ASAP and not reuse memory, then ldc's address sanitizer should do the job for this problem.

CyberShadow commented 2 years ago

if we could get the GC to free ASAP

To do that, you would need either something like reference counting or performing a full mark&sweep every statement.

rikkimax commented 2 years ago

That's not what I meant. I meant instead of setting it as free internally when ready to free, just free that memory range. So that ASAN knows that the memory should not be accessible.

CyberShadow commented 2 years ago

In this case, that wouldn't help at all. You would just get the ASAN violation instead of the UTFException, in the exact same circumstances, and with no other clues to help.

CyberShadow commented 2 years ago

I wonder how it would be possible to avoid such an issue in the future? Should all custom allocators forbid allocating data with indirections unless GC addRange/removeRange has been added properly?

Sure, that's one way.

I suggested another way (only allow non-GC pointers, and verify that this is true at runtime in debug builds).

Could we add interfaces or patterns that could enforce that statically?

It would certainly be possible with a custom pointer type, which indicates that the memory is managed manually and not owned by the GC, and having the allocator types only accept values which have no indirections except via this wrapper. But I can't predict if the syntactic overhead would be bearable.

I don't think we can generally achieve memory safety using static checks at this scale, without becoming Rust. :)

WebFreak001 commented 2 years ago

need someone else to merge, can't merge until travis is done, which I think we can wait for forever since travis-ci.org is gone.

Geod24 commented 2 years ago

Did the force push, and made travis-ci optional so it won't block you in the future.

dlang-community / libdparse

dparse: Fix memory corruption #445