Crashes while freeing a g_string

MultiMarkdown seems to be crashing when it tries to release a g_string incorrectly. I have not been able to reproduce this on demand, but here's the stacktrace I'm seeing from customers in the wild:

Thread 9 Crashed:: Dispatch queue: com.apple.root.default-qos
0   libsystem_kernel.dylib          0x00007fff9e9310ae __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fff8e3ff500 pthread_kill + 90
2   libsystem_c.dylib               0x00007fff9688837b abort + 129
3   libsystem_malloc.dylib          0x00007fff9e4f3051 free + 425
4   com.incident57.CodeKit2         0x000000010c528ed0 g_string_free + 39
5   com.incident57.CodeKit2         0x000000010c5d9d65 __42-[LPEngineMARKDOWN compile:fromTaskGroup:]_block_invoke + 973
6   libdispatch.dylib               0x00007fff982348f5 _dispatch_call_block_and_release + 12
7   libdispatch.dylib               0x00007fff982293c3 _dispatch_client_callout + 8
8   libdispatch.dylib               0x00007fff9822d253 _dispatch_root_queue_drain + 1890
9   libdispatch.dylib               0x00007fff9822cab8 _dispatch_worker_thread3 + 91
10  libsystem_pthread.dylib         0x00007fff8e3fc4f2 _pthread_wqthread + 1129
11  libsystem_pthread.dylib         0x00007fff8e3fa375 start_wqthread + 13

And the exception:

Crashed Thread:        9  Dispatch queue: com.apple.root.default-qos

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called
*** error for object 0x7f821b0e9a00: pointer being freed was not allocated

Additional Info

I've spent a lot of time making sure my own code is solid and not over-freeing anything. As far as I can tell, this is coming from MultiMarkdown, as the crash happens inside the markdown_to_string function. I've also stepped through the g_string implementation by @danielpunkass and it looks error-free. I'm copying him on this issue just in case, anyway.

I do have customers who have been able to reproduce this on every run of the MultiMarkdown compiler, but I haven't been able to reproduce it on my own Mac.

I've tried to walk through the MultiMarkdown source, but it's incredibly tough to track what's returning something that should be freed, etc because the function names don't follow any convention. For example: label_from_string frees a g_string, but not the character buffer, which has to be freed by whatever called label_from_string. That name should really be new_label_from_string or create_label_from_string to indicate memory ownership.

I'm hoping someone more familiar with the source might be able to track down the over-free.

MultiMarkdown Composer (v2 in App Store and the beta for unreleased v3) uses MMD-4 as the engine for syntax highlighting and previews. In my beta test versions for MMDC v3, I am running multiple parses of the MMD engine for each keystroke (on different threads). It has been a long time since I have found a bug like this in MMD itself (there are still some unidentified bugs in MMDC v3, but that's a different story.)

Are your users doing something "strange" with MMD? (e.g. using an unconventional document structure that uses MMD features that no one else ever uses?)

Never say never, and all that, but I'd be very surprised if this is a bug in MMD. Without knowing what's going on in LPEngineMARKDOWN, I can't help identify what's going wrong in your code. But I certainly don't see these kind of problems in my code, which is being tested by many users in many languages every day (both in the beta and production versions). If you're seeing multiple users run into this problem with every run, there's really only two likely conclusions:

There is something strange about your users' documents that consistently trigger an otherwise unidentified bug in MMD. Possible, but unlikely unless your app has a very narrowly defined use for MMD parsing. This should be readily obvious, and if this is the case I'm happy to look at sample documents to see what's going on.
The problem is external to MMD -- particularly if you are using multiple threads. MMD is thread-safe, but that doesn't mean that individual functions can be used on the same memory locations in multiple threads without understanding exactly what is going on. That could certainly lead to over-freeing of data. MMD is designed to be used as a "high-level" library (e.g. convert this text string into HTML and don't bother me with the details). If you're going to use small bits and pieces individually, you're responsible for understanding how they work and how to properly use them.

Thanks for the quick response. A couple points:

I'm happy to post the full implementation of that objective-c method. I'll do so in the next comment.
Thread safety: I'm not calling MMD from multiple threads. I AM calling it from a single background thread using GCD. My implementation follows your command-line tool, as in the next comment.
I've had about 4 or 5 users with these same crash reports now. I've had two of them send me the files that caused the crash for them, but I'm unable to reproduce the crash over here using those same files. I'm still waiting for the files from the latest person to report this.

Here's the full implementation of that method:

- (void) compile:(LPFileMARKDOWN *)aFile fromTaskGroup:(LPCompilerTaskGroup *)taskGroup withResultInfo:(LPResultStandard *)resultInfo
{
    resultInfo.engineType = LPEngineTypeMARKDOWN;

    // Copy everything so that it can't be destroyed before the block below runs on a GCD concurrent thread.
    NSString *inputPath = [[NSString alloc] initWithString:aFile.inputFullPath];
    NSString *outputPath = [[NSString alloc] initWithString:aFile.outputFullPath];

    //
    //  Set Markdown options
    //
    long optionsFlag = 0;

    if ([aFile.outputStyle integerValue] == LPMARKDOWNOutputStyleComplete) {
        optionsFlag |= EXT_COMPLETE;
    } else {
        optionsFlag |= EXT_SNIPPET;
    }

    LPMARKDOWNCriticStyle criticStyle = [aFile.criticStyle integerValue];
    if (criticStyle == LPMARKDOWNCriticStyleAccept) {
        optionsFlag |= EXT_CRITIC_ACCEPT;
    } else if (criticStyle == LPMARKDOWNCriticStyleReject) {
        optionsFlag |= EXT_CRITIC_REJECT;
    }

    if ([aFile.enableFootnotes boolValue]) optionsFlag |= EXT_NOTES;
    if ([aFile.enableSmartQuotes boolValue]) optionsFlag |= EXT_SMART;
    if (![aFile.enableLabels boolValue]) optionsFlag |= EXT_NO_LABELS;          // Careful; inverted!
    if (![aFile.parseMetadata boolValue]) optionsFlag |= EXT_NO_METADATA;       // Careful; inverted!
    if ([aFile.maskEmailAddresses boolValue]) optionsFlag |= EXT_OBFUSCATE;
    if ([aFile.randomFootnoteNumbers boolValue]) optionsFlag |= EXT_RANDOM_FOOT;
    if ([aFile.escapeLineBreaks boolValue]) optionsFlag |= EXT_ESCAPED_LINE_BREAKS;
    if ([aFile.processHTML boolValue]) optionsFlag |= EXT_PROCESS_HTML;

    // MultiMarkdown.c sets this first, then overrides with settings above. That does not seem to make sense to me. So I'm putting it last.
    if ([aFile.useCompatibilityMode boolValue]) {
        optionsFlag = EXT_COMPATIBILITY | EXT_NO_LABELS | EXT_OBFUSCATE;
    }

    //
    //  Set the Output Format
    LPMARKDOWNOutputFormat ckOutputFormat = [aFile.outputFormat integerValue];

    // Enable HEADINGSECTION for certain formats. MultiMarkdown.c does this, so it must matter for these formats.
    if (ckOutputFormat == LPMARKDOWNOutputFormatOPML ||
        ckOutputFormat == LPMARKDOWNOutputFormatBeamer ||
        ckOutputFormat == LPMARKDOWNOutputFormatLyx)
    {
       optionsFlag |= EXT_HEADINGSECTION;
    }

    int outputFormat = 0;
    switch (ckOutputFormat)
    {
        case LPMARKDOWNOutputFormatHTML: {
            outputFormat = HTML_FORMAT;
            break;
        }
        // TEXT_FORMAT is currently not supported and not fully implemented in MMD. Do not use.
//        case LPMARKDOWNOutputFormatText: {
//            outputFormat = TEXT_FORMAT;
//            break;
//        }
        case LPMARKDOWNOutputFormatLatex: {
            outputFormat = LATEX_FORMAT;
            break;
        }
        case LPMARKDOWNOutputFormatBeamer: {
            outputFormat = BEAMER_FORMAT;
            break;
        }
        case LPMARKDOWNOutputFormatMemoir: {
            outputFormat = MEMOIR_FORMAT;
            break;
        }
        case LPMARKDOWNOutputFormatODF: {
            outputFormat = ODF_FORMAT;
            break;
        }
        case LPMARKDOWNOutputFormatOPML: {
            outputFormat = OPML_FORMAT;
            break;
        }
        case LPMARKDOWNOutputFormatLyx: {
            outputFormat = LYX_FORMAT;
            break;
        }
        // RTF_FORMAT is also not currently finished. MMD dev says it's not safe for production environments.
//        case LPMARKDOWNOutputFormatRTF: {
//            outputFormat = RTF_FORMAT;
//            break;
//        }
        case LPMARKDOWNOutputFormatOriginal: {
            outputFormat = ORIGINAL_FORMAT;
            break;
        }
        default: {
            outputFormat = HTML_FORMAT;
            break;
        }
    }

    // MultiMarkdown is in C, so going to a background thread might be overkill here.
    // Still, we'll do it in case someone tries to compile a 7,000,000 line markdown file.
    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0),
                   ^{
                       // Open the input file
                       char const *pathToFile = [inputPath cStringUsingEncoding:NSUTF8StringEncoding];

                       FILE *inputFile = fopen(pathToFile, "r");
                       if (inputFile == NULL)
                       {
                           resultInfo.resultType = LPCompilerResultTypeError;
                           [resultInfo addResultMessage:@"Markdown: Compiling failed because this file could not be opened or read. It may be corrupt."];
                           resultInfo.canContinueProcessingSequence = NO;

                           dispatch_async(dispatch_get_main_queue(),
                                          ^{
                                              [_rootCompiler finishedProcessingStepForFile:aFile fromTaskGroup:taskGroup withResultInfo:resultInfo];
                                          });

                           [inputPath release];
                           [outputPath release];
                           return;
                       }

                       //
                       //  Read in the entire input file contents.
                       //
                       GString *inputBuffer = g_string_new("");

                       int currentChar = 0;
                       while ((currentChar = fgetc(inputFile)) != EOF) {
                           g_string_append_c(inputBuffer, currentChar);
                       }
                       fclose(inputFile);

                       //
                       //  Process includes, unless we're in "compatibility mode"
                       //
                       if (!(optionsFlag & EXT_COMPATIBILITY))
                       {
                           prepend_mmd_header(inputBuffer);
                           append_mmd_footer(inputBuffer);

                           const char *inputFolderPath = [[inputPath stringByDeletingLastPathComponent] cStringUsingEncoding:NSUTF8StringEncoding];
                           transclude_source(inputBuffer, (char *)inputFolderPath, NULL, outputFormat, NULL);
                       }

                       //
                       //  Compile.
                       //
                       char *outputBuffer = NULL;

                       if (outputFormat == ORIGINAL_FORMAT)
                       {
                           // User wants original MD; do not compile.
                           outputBuffer = inputBuffer->str;
                       }
                       else
                       {
                           outputBuffer = markdown_to_string(inputBuffer->str, optionsFlag, outputFormat);
                       }

                       if (outputBuffer == NULL)
                       {
                           resultInfo.resultType = LPCompilerResultTypeError;
                           [resultInfo addResultMessage:@"Markdown: Compiling failed. This is likely due to a syntax error. Unfortunately, the MultiMarkdown compiler does not return error information. Check your file carefully and try again."];
                           resultInfo.canContinueProcessingSequence = NO;
                       }
                       else
                       {
                           //Open the output file for writing/overwriting
                           char const *pathToOutputFile = [outputPath cStringUsingEncoding:NSUTF8StringEncoding];

                           FILE *outFile = fopen(pathToOutputFile, "w+");
                           if (outFile == NULL)
                           {
                               NSLog(@"LPEngineMARKDOWN error: unable to open file pointer for writing to path: %@", outputPath);
                               resultInfo.resultType = LPCompilerResultTypeError;
                               [resultInfo addResultMessage:[NSString stringWithFormat:@"Markdown: Compiling failed because the output could not be written to this path: %@ Check permissions and try again.", outputPath]];
                               resultInfo.canContinueProcessingSequence = NO;
                           }
                           else
                           {
                               int success = fprintf(outFile, "%s\n", outputBuffer);

                               if (success < 0)
                               {
                                   resultInfo.resultType = LPCompilerResultTypeError;
                                   [resultInfo addResultMessage:@"Markdown: Compiling failed because CodeKit could not write the result to the output path. Check permissions and try again."];
                                   resultInfo.canContinueProcessingSequence = NO;
                               }
                               else
                               {
                                   resultInfo.resultType = LPCompilerResultTypeSuccess;
                                   [resultInfo addResultMessage:@"Markdown: Compiled successfully."];

                                   // Whatever the next engine is going to do, it should do it on our output file
                                   resultInfo.processingTarget = LPProcessingTargetOutputFile;
                               }

                               fclose(outFile);
                           }
                       }

                       dispatch_async(dispatch_get_main_queue(),
                                      ^{
                                          [_rootCompiler finishedProcessingStepForFile:aFile fromTaskGroup:taskGroup withResultInfo:resultInfo];
                                      });

                       // Cleanup
                       [inputPath release];
                       [outputPath release];

                       // Even though g_string_free checks for NULL before calling free(), I've added a check here to try to fight this issue:
                       if (inputBuffer != NULL) {
                           g_string_free(inputBuffer, true);
                           inputBuffer = NULL;
                       }
                   });
}

I'm certainly not ruling out that the issue is on my end ("When you think it's the framework, check your code again.")

But just knowing that you're not seeing the same random issue helps me. I will attempt to reproduce the problem with the latest user's files (if and when I get them). Hopefully they will finally let me reproduce it.

It appears that the only time LPEngineMARKDOWN directly calls g_string_free is at the end, correct? So that's the problematic area.

When you do this (outputBuffer = inputBuffer->str), where does outputBuffer get freed? That would lead to freeing inputBuffer->str twice.

I'm away from my laptop so I can't be positive, but if I recall correctly, outputBuffer is never malloc-ed. It's just a char* that markdown_to_string returns. Therefore, I don't free outputBuffer.

Sent from my iPhone

On Nov 2, 2015, at 12:47, Fletcher T. Penney notifications@github.com wrote:

It appears that the only time LPEngineMARKDOWN directly calls g_string_free is at the end, correct. So that's the problematic area.

When you do this (outputBuffer = inputBuffer->str), where does outputBuffer get freed? That would lead to freeing inputBuffer->str twice.

— Reply to this email directly or view it on GitHub.

Hi @bdkjones - I tend to agree with @fletcher's interpretation that the crash doesn't look like it's happening in markdown_to_string, but in your own call to g_string_free. At least the stack crawl seems to imply it's called directly from the block you dispatch from -[LPEngineMARKDOWN compile:fromTaskGroup:]. I'm curious what gave you confidence to think it's crashing inside markdown_to_string?

I also think @fletcher is right you want to be freeing the result from markdown_to_string. It has to be getting malloc'd inside MMD. There's no other code that could be responsible for freeing it - unless I gravely misunderstood MMD's memory management :)

But all that said, I can't see how it WOULD end up getting double-free'd, because on the contrary it seems as though it never gets free'd at all. Maybe some weird code path that you're reaching is causing inputBuffer->str to be prematurely freed or realloc'd, and you're holding on to a stale copy?

Okay, I went back and looked at my commit history. I did indeed free() the outputBuffer (after checking for NULL). However, I removed that call while attempting to diagnose what was causing this issue. I've also traced the ownership of that memory:

markdown_to_string creates a char* out at the top:

char * markdown_to_string(const char * source, unsigned long extensions, int format) {
    char *out;
        [...]

And at the bottom, sets this equal to the return value of export_node_tree (or, if an error occurs, out is set to a string using strdup):

/* Show what we got */
    out = export_node_tree(refined, format, extensions);

    /* clean up */
    free_parser_data((parser_data *)g.data);
    yydeinit(&g);

    free(formatted);
    return out;

export_node_tree in turn, does this at the top:

char * export_node_tree(node *list, int format, unsigned long extensions) {
    char *output;
    GString *out = g_string_new("");

And then calls g_string_free on out, but leaves the underlying character buffer allocated:

    output = out->str;
    g_string_free(out, false);
    free_scratch_pad(scratch);

#ifdef DEBUG_ON
    fprintf(stderr, "finish export_node_tree\n");
#endif
    return output;

So, the end result is that the outputBuffer is indeed owned memory that should be freed. But I took that call out, so it can't possibly be double-freeing str, correct?

The "weird-code-path" theory is what I'm looking at. I've followed the MMD routine, though, and can't find a route that would explain the problem.

Plus, the call to free outputBuffer is just free(), since outputBuffer is only the underlying character store and not a full g_string. g_string_free() is where the exception is being thrown and the only time I call that function is to free the inputBuffer, which is created above. Is there any path in which MMD would somehow modify or free that memory?

@bdkjones you said you took out the free() while looking into this crash. So can you confirm that the users who are reporting these crashes are still seeing the crash after updating to your code that DOESN'T do the free? You shipped that change?

The speculation @fletcher was making about double-freeing was because of the case where you assign inputBuffer->str to to outputBuffer. If you used to free outputBuffer, then when it happened to point to inputBuffer->str's allocated value, you were freeing is out from under the inputBuffer g_string. So, inputBuffer was still not NULL, but inputBuffer->str would be deallocated. When you go to free inputBuffer with g_string_free, it crashes trying to free an already-freed inputBuffer->str.

It adds up pretty neatly so I am super interested to know if you actually shipped your change to customers to not use free at all on outputBuffer :)

Well damn, you're exactly right: that does add up perfectly. It also explains why the crash only affects a few people who leave the output format in MD. Sorry I missed it earlier and thanks for the help!

Glad the issue seems to be solved!

fletcher / MultiMarkdown-4

Crashes while freeing a g_string #142

Additional Info