Closed bdkjones closed 9 years ago
MultiMarkdown Composer (v2 in App Store and the beta for unreleased v3) uses MMD-4 as the engine for syntax highlighting and previews. In my beta test versions for MMDC v3, I am running multiple parses of the MMD engine for each keystroke (on different threads). It has been a long time since I have found a bug like this in MMD itself (there are still some unidentified bugs in MMDC v3, but that's a different story.)
Are your users doing something "strange" with MMD? (e.g. using an unconventional document structure that uses MMD features that no one else ever uses?)
Never say never, and all that, but I'd be very surprised if this is a bug in MMD. Without knowing what's going on in LPEngineMARKDOWN
, I can't help identify what's going wrong in your code. But I certainly don't see these kind of problems in my code, which is being tested by many users in many languages every day (both in the beta and production versions). If you're seeing multiple users run into this problem with every run, there's really only two likely conclusions:
Thanks for the quick response. A couple points:
Here's the full implementation of that method:
- (void) compile:(LPFileMARKDOWN *)aFile fromTaskGroup:(LPCompilerTaskGroup *)taskGroup withResultInfo:(LPResultStandard *)resultInfo
{
resultInfo.engineType = LPEngineTypeMARKDOWN;
// Copy everything so that it can't be destroyed before the block below runs on a GCD concurrent thread.
NSString *inputPath = [[NSString alloc] initWithString:aFile.inputFullPath];
NSString *outputPath = [[NSString alloc] initWithString:aFile.outputFullPath];
//
// Set Markdown options
//
long optionsFlag = 0;
if ([aFile.outputStyle integerValue] == LPMARKDOWNOutputStyleComplete) {
optionsFlag |= EXT_COMPLETE;
} else {
optionsFlag |= EXT_SNIPPET;
}
LPMARKDOWNCriticStyle criticStyle = [aFile.criticStyle integerValue];
if (criticStyle == LPMARKDOWNCriticStyleAccept) {
optionsFlag |= EXT_CRITIC_ACCEPT;
} else if (criticStyle == LPMARKDOWNCriticStyleReject) {
optionsFlag |= EXT_CRITIC_REJECT;
}
if ([aFile.enableFootnotes boolValue]) optionsFlag |= EXT_NOTES;
if ([aFile.enableSmartQuotes boolValue]) optionsFlag |= EXT_SMART;
if (![aFile.enableLabels boolValue]) optionsFlag |= EXT_NO_LABELS; // Careful; inverted!
if (![aFile.parseMetadata boolValue]) optionsFlag |= EXT_NO_METADATA; // Careful; inverted!
if ([aFile.maskEmailAddresses boolValue]) optionsFlag |= EXT_OBFUSCATE;
if ([aFile.randomFootnoteNumbers boolValue]) optionsFlag |= EXT_RANDOM_FOOT;
if ([aFile.escapeLineBreaks boolValue]) optionsFlag |= EXT_ESCAPED_LINE_BREAKS;
if ([aFile.processHTML boolValue]) optionsFlag |= EXT_PROCESS_HTML;
// MultiMarkdown.c sets this first, then overrides with settings above. That does not seem to make sense to me. So I'm putting it last.
if ([aFile.useCompatibilityMode boolValue]) {
optionsFlag = EXT_COMPATIBILITY | EXT_NO_LABELS | EXT_OBFUSCATE;
}
//
// Set the Output Format
LPMARKDOWNOutputFormat ckOutputFormat = [aFile.outputFormat integerValue];
// Enable HEADINGSECTION for certain formats. MultiMarkdown.c does this, so it must matter for these formats.
if (ckOutputFormat == LPMARKDOWNOutputFormatOPML ||
ckOutputFormat == LPMARKDOWNOutputFormatBeamer ||
ckOutputFormat == LPMARKDOWNOutputFormatLyx)
{
optionsFlag |= EXT_HEADINGSECTION;
}
int outputFormat = 0;
switch (ckOutputFormat)
{
case LPMARKDOWNOutputFormatHTML: {
outputFormat = HTML_FORMAT;
break;
}
// TEXT_FORMAT is currently not supported and not fully implemented in MMD. Do not use.
// case LPMARKDOWNOutputFormatText: {
// outputFormat = TEXT_FORMAT;
// break;
// }
case LPMARKDOWNOutputFormatLatex: {
outputFormat = LATEX_FORMAT;
break;
}
case LPMARKDOWNOutputFormatBeamer: {
outputFormat = BEAMER_FORMAT;
break;
}
case LPMARKDOWNOutputFormatMemoir: {
outputFormat = MEMOIR_FORMAT;
break;
}
case LPMARKDOWNOutputFormatODF: {
outputFormat = ODF_FORMAT;
break;
}
case LPMARKDOWNOutputFormatOPML: {
outputFormat = OPML_FORMAT;
break;
}
case LPMARKDOWNOutputFormatLyx: {
outputFormat = LYX_FORMAT;
break;
}
// RTF_FORMAT is also not currently finished. MMD dev says it's not safe for production environments.
// case LPMARKDOWNOutputFormatRTF: {
// outputFormat = RTF_FORMAT;
// break;
// }
case LPMARKDOWNOutputFormatOriginal: {
outputFormat = ORIGINAL_FORMAT;
break;
}
default: {
outputFormat = HTML_FORMAT;
break;
}
}
// MultiMarkdown is in C, so going to a background thread might be overkill here.
// Still, we'll do it in case someone tries to compile a 7,000,000 line markdown file.
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0),
^{
// Open the input file
char const *pathToFile = [inputPath cStringUsingEncoding:NSUTF8StringEncoding];
FILE *inputFile = fopen(pathToFile, "r");
if (inputFile == NULL)
{
resultInfo.resultType = LPCompilerResultTypeError;
[resultInfo addResultMessage:@"Markdown: Compiling failed because this file could not be opened or read. It may be corrupt."];
resultInfo.canContinueProcessingSequence = NO;
dispatch_async(dispatch_get_main_queue(),
^{
[_rootCompiler finishedProcessingStepForFile:aFile fromTaskGroup:taskGroup withResultInfo:resultInfo];
});
[inputPath release];
[outputPath release];
return;
}
//
// Read in the entire input file contents.
//
GString *inputBuffer = g_string_new("");
int currentChar = 0;
while ((currentChar = fgetc(inputFile)) != EOF) {
g_string_append_c(inputBuffer, currentChar);
}
fclose(inputFile);
//
// Process includes, unless we're in "compatibility mode"
//
if (!(optionsFlag & EXT_COMPATIBILITY))
{
prepend_mmd_header(inputBuffer);
append_mmd_footer(inputBuffer);
const char *inputFolderPath = [[inputPath stringByDeletingLastPathComponent] cStringUsingEncoding:NSUTF8StringEncoding];
transclude_source(inputBuffer, (char *)inputFolderPath, NULL, outputFormat, NULL);
}
//
// Compile.
//
char *outputBuffer = NULL;
if (outputFormat == ORIGINAL_FORMAT)
{
// User wants original MD; do not compile.
outputBuffer = inputBuffer->str;
}
else
{
outputBuffer = markdown_to_string(inputBuffer->str, optionsFlag, outputFormat);
}
if (outputBuffer == NULL)
{
resultInfo.resultType = LPCompilerResultTypeError;
[resultInfo addResultMessage:@"Markdown: Compiling failed. This is likely due to a syntax error. Unfortunately, the MultiMarkdown compiler does not return error information. Check your file carefully and try again."];
resultInfo.canContinueProcessingSequence = NO;
}
else
{
//Open the output file for writing/overwriting
char const *pathToOutputFile = [outputPath cStringUsingEncoding:NSUTF8StringEncoding];
FILE *outFile = fopen(pathToOutputFile, "w+");
if (outFile == NULL)
{
NSLog(@"LPEngineMARKDOWN error: unable to open file pointer for writing to path: %@", outputPath);
resultInfo.resultType = LPCompilerResultTypeError;
[resultInfo addResultMessage:[NSString stringWithFormat:@"Markdown: Compiling failed because the output could not be written to this path: %@ Check permissions and try again.", outputPath]];
resultInfo.canContinueProcessingSequence = NO;
}
else
{
int success = fprintf(outFile, "%s\n", outputBuffer);
if (success < 0)
{
resultInfo.resultType = LPCompilerResultTypeError;
[resultInfo addResultMessage:@"Markdown: Compiling failed because CodeKit could not write the result to the output path. Check permissions and try again."];
resultInfo.canContinueProcessingSequence = NO;
}
else
{
resultInfo.resultType = LPCompilerResultTypeSuccess;
[resultInfo addResultMessage:@"Markdown: Compiled successfully."];
// Whatever the next engine is going to do, it should do it on our output file
resultInfo.processingTarget = LPProcessingTargetOutputFile;
}
fclose(outFile);
}
}
dispatch_async(dispatch_get_main_queue(),
^{
[_rootCompiler finishedProcessingStepForFile:aFile fromTaskGroup:taskGroup withResultInfo:resultInfo];
});
// Cleanup
[inputPath release];
[outputPath release];
// Even though g_string_free checks for NULL before calling free(), I've added a check here to try to fight this issue:
if (inputBuffer != NULL) {
g_string_free(inputBuffer, true);
inputBuffer = NULL;
}
});
}
I'm certainly not ruling out that the issue is on my end ("When you think it's the framework, check your code again.")
But just knowing that you're not seeing the same random issue helps me. I will attempt to reproduce the problem with the latest user's files (if and when I get them). Hopefully they will finally let me reproduce it.
It appears that the only time LPEngineMARKDOWN
directly calls g_string_free
is at the end, correct? So that's the problematic area.
When you do this (outputBuffer = inputBuffer->str
), where does outputBuffer get freed? That would lead to freeing inputBuffer->str
twice.
I'm away from my laptop so I can't be positive, but if I recall correctly, outputBuffer is never malloc-ed. It's just a char* that markdown_to_string returns. Therefore, I don't free outputBuffer.
Sent from my iPhone
On Nov 2, 2015, at 12:47, Fletcher T. Penney notifications@github.com wrote:
It appears that the only time LPEngineMARKDOWN directly calls g_string_free is at the end, correct. So that's the problematic area.
When you do this (outputBuffer = inputBuffer->str), where does outputBuffer get freed? That would lead to freeing inputBuffer->str twice.
— Reply to this email directly or view it on GitHub.
Hi @bdkjones - I tend to agree with @fletcher's interpretation that the crash doesn't look like it's happening in markdown_to_string, but in your own call to g_string_free. At least the stack crawl seems to imply it's called directly from the block you dispatch from -[LPEngineMARKDOWN compile:fromTaskGroup:]. I'm curious what gave you confidence to think it's crashing inside markdown_to_string?
I also think @fletcher is right you want to be freeing the result from markdown_to_string. It has to be getting malloc'd inside MMD. There's no other code that could be responsible for freeing it - unless I gravely misunderstood MMD's memory management :)
But all that said, I can't see how it WOULD end up getting double-free'd, because on the contrary it seems as though it never gets free'd at all. Maybe some weird code path that you're reaching is causing inputBuffer->str to be prematurely freed or realloc'd, and you're holding on to a stale copy?
Okay, I went back and looked at my commit history. I did indeed free()
the outputBuffer (after checking for NULL). However, I removed that call while attempting to diagnose what was causing this issue. I've also traced the ownership of that memory:
markdown_to_string
creates a char* out
at the top:char * markdown_to_string(const char * source, unsigned long extensions, int format) {
char *out;
[...]
And at the bottom, sets this equal to the return value of export_node_tree
(or, if an error occurs, out
is set to a string using strdup
):
/* Show what we got */
out = export_node_tree(refined, format, extensions);
/* clean up */
free_parser_data((parser_data *)g.data);
yydeinit(&g);
free(formatted);
return out;
export_node_tree
in turn, does this at the top:char * export_node_tree(node *list, int format, unsigned long extensions) {
char *output;
GString *out = g_string_new("");
And then calls g_string_free
on out
, but leaves the underlying character buffer allocated:
output = out->str;
g_string_free(out, false);
free_scratch_pad(scratch);
#ifdef DEBUG_ON
fprintf(stderr, "finish export_node_tree\n");
#endif
return output;
So, the end result is that the outputBuffer
is indeed owned memory that should be freed. But I took that call out, so it can't possibly be double-freeing str
, correct?
The "weird-code-path" theory is what I'm looking at. I've followed the MMD routine, though, and can't find a route that would explain the problem.
Plus, the call to free outputBuffer
is just free()
, since outputBuffer
is only the underlying character store and not a full g_string. g_string_free()
is where the exception is being thrown and the only time I call that function is to free the inputBuffer
, which is created above. Is there any path in which MMD would somehow modify or free that memory?
@bdkjones you said you took out the free() while looking into this crash. So can you confirm that the users who are reporting these crashes are still seeing the crash after updating to your code that DOESN'T do the free? You shipped that change?
The speculation @fletcher was making about double-freeing was because of the case where you assign inputBuffer->str to to outputBuffer. If you used to free outputBuffer, then when it happened to point to inputBuffer->str's allocated value, you were freeing is out from under the inputBuffer g_string. So, inputBuffer was still not NULL, but inputBuffer->str would be deallocated. When you go to free inputBuffer with g_string_free, it crashes trying to free an already-freed inputBuffer->str.
It adds up pretty neatly so I am super interested to know if you actually shipped your change to customers to not use free at all on outputBuffer :)
Well damn, you're exactly right: that does add up perfectly. It also explains why the crash only affects a few people who leave the output format in MD. Sorry I missed it earlier and thanks for the help!
Glad the issue seems to be solved!
MultiMarkdown seems to be crashing when it tries to release a
g_string
incorrectly. I have not been able to reproduce this on demand, but here's the stacktrace I'm seeing from customers in the wild:And the exception:
Additional Info
I've spent a lot of time making sure my own code is solid and not over-freeing anything. As far as I can tell, this is coming from MultiMarkdown, as the crash happens inside the
markdown_to_string
function. I've also stepped through the g_string implementation by @danielpunkass and it looks error-free. I'm copying him on this issue just in case, anyway.I do have customers who have been able to reproduce this on every run of the MultiMarkdown compiler, but I haven't been able to reproduce it on my own Mac.
I've tried to walk through the MultiMarkdown source, but it's incredibly tough to track what's returning something that should be freed, etc because the function names don't follow any convention. For example:
label_from_string
frees ag_string
, but not the character buffer, which has to be freed by whatever calledlabel_from_string
. That name should really benew_label_from_string
orcreate_label_from_string
to indicate memory ownership.I'm hoping someone more familiar with the source might be able to track down the over-free.