fletcher / MultiMarkdown-4

This project is now deprecated. Please use MultiMarkdown-6 instead!
https://github.com/fletcher/MultiMarkdown-5
Other
306 stars 59 forks source link

Incorrect Use of Exit() #107

Open bdkjones opened 9 years ago

bdkjones commented 9 years ago

I'm incorporating MMD into a GUI on OS X. To do this, I'm using libMultiMarkdown and calling the various functions directly (as opposed to calling the compiled mmd binary on a separate process).

The Problem

Whenever the parser or one of the writers (odf.c, opml.c, etc.) encounters an error, you call exit(EXIT_FAILURE), which immediately terminates my entire app. This is obviously not what we want.

The Solution

The only thing that should be calling exit() is multimarkdown.c, which is the file that compiles into the command line utility. The rest of the project should never call exit() directly. Instead, the various functions in the rest of the project should include an error parameter. That way, when an implementor attempts to use libMultiMarkdown in their project, the framework is agnostic—it won't assume that it's simply running as a separate process where it's fine to kill everything.

Proposals

The main function currently looks like this:

char * markdown_to_string(const char * source, unsigned long extensions, int format);

One option would be to add an error parameter, like this:

char * markdown_to_string(const char * source, unsigned long extensions, int format, char **error);

If error is not NULL, you know something went wrong and can handle it appropriately. The parameter would be populated with the messages you currently spit to stderr just before calling exit(). This is the option with the least amount of major refactoring.

A better option would be to create a set of "context" structs, like this:

struct mmd_result_context
{
    bool success;
    char *output;
    char *error;
};

struct mmd_context 
{
    char *source;
    int outputFormat;
    unsigned long extensions;
    mmd_result_context *result
};

Now, you create an mmd_context and set the first three parameters. Then, instead of calling markdown_to_string(), you'd call something like:

bool  mmd_compile_context(mmd_context *ctx);

This would create the mmd_result_context as a child of the original mmd_context. You could inspect the contents of the result context to act appropriately. This is the much more robust solution.

Need a Go-Ahead

I'm willing to refactor MMD to use either of these approaches, but I'd like to know that if I do so and submit a pull request, you'll merge it. Otherwise, I'd have to re-do all this work every time you update the project in the future. I don't want to invest all the time without knowing up-front if this is something you'll approve. Thanks!

fletcher commented 9 years ago

These exit() calls are placed in parts of the code that should not be reached (e.g. being asked to handle a node key that doesn't exist.)

If you're encountering an error like this, there is a fundamental problem that needs to be fixed. These aren't just "the user typed a letter instead of a number" type errors that require resolution. These are "there's something broken in your program" type errors.

I've been using this as a library in MultiMarkdown Composer and have never had a the application "killed" by libMultiMarkdown in the years I have been doing this.

Under what situation are you encountering these problems?

bdkjones commented 9 years ago

I'm attempting to compile a very simple file to Text output. The message logged to stderr just before the app terminates is:

print_text_node encountered unknown node key = 5

Here's the contents of the Markdown file I'm compiling:

This is a markdown document.

### I have no idea what I'm doing

1. One
2. Two
3. Three

{{secondary.md}}

And here is the contents of the transcluded file, secondary.md:

## This is secondary

Well isn't this just fantastic?

I am passing the following flags to the compiler:

EXT_COMPLETE
EXT_NOTES
EXT_SMART
EXT_OBFUSCATE

The output format is TEXT_FORMAT

The compiling sequence goes like this:

       //
       //  Read in the entire input file contents.
       //
       GString *inputBuffer = g_string_new("");

       int currentChar = 0;
       while ((currentChar = fgetc(inputFile)) != EOF) {
           g_string_append_c(inputBuffer, currentChar);
       }
       fclose(inputFile);

       //
       //  Process includes, unless we're in "compatibility mode"
       //
       if (!(optionsFlag & EXT_COMPATIBILITY))
       {
           prepend_mmd_header(inputBuffer);
           append_mmd_footer(inputBuffer);

           const char *inputFolderPath = [[aPath stringByDeletingLastPathComponent] cStringUsingEncoding:NSUTF8StringEncoding];
           transclude_source(inputBuffer, (char *)inputFolderPath, NULL, outputFormat, NULL);
       }

       //
       //  Compile.
       //
       char *outputBuffer = NULL;

       if (outputFormat == ORIGINAL_FORMAT)
       {
           // User wants original MD; do not compile.
           outputBuffer = inputBuffer->str;
       }
       else
       {
           outputBuffer = markdown_to_string(inputBuffer->str, optionsFlag, outputFormat);
       }

Comments

I followed your implementation in multimarkdown.c. I am perfectly willing to acknowledge that the problem may be in my code.

However, I still think the parser/writers should not be able to terminate the host process. The host process should inspect the result of the compiler and decide how to react. In my case, that means showing an error message in the UI. In the case of multimarkdown.c, that means terminating with an error exit code.

Check out the Libsass project for a great example of how this is done. (That's where I got the "contexts" idea from.)

bdkjones commented 9 years ago

I flipped DEBUG_ON on and here's what got logged:

reverse_list: '.'
reverse_list: 'm'
reverse_list: 'doing'
reverse_list: 'One
'
reverse_list: 'One
'
reverse_list: 'Two
'
reverse_list: 'Two
'
reverse_list: 'Three
'
reverse_list: 'Three
'
reverse_list: 'secondary'
reverse_list: 't'
reverse_list: '?'
reverse_list: 'One'
reverse_list: 'Two'
reverse_list: 'Three'
export_node_tree
extract_references

start label from node_tree
start print_raw_node_tree: '2'
print raw node 2: 'I'
finish print raw node 2: 'I'
'I'
print raw node 8: ' '
finish print raw node 8: ' '
'I '
print raw node 2: 'have'
finish print raw node 2: 'have'
'I have'
print raw node 8: ' '
finish print raw node 8: ' '
'I have '
print raw node 2: 'no'
finish print raw node 2: 'no'
'I have no'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no '
print raw node 2: 'idea'
finish print raw node 2: 'idea'
'I have no idea'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no idea '
print raw node 2: 'what'
finish print raw node 2: 'what'
'I have no idea what'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no idea what '
print raw node children from 1
start print_raw_node_tree: '2'
print raw node 2: 'I'
finish print raw node 2: 'I'
'I have no idea what I'
print raw node children from 3
finish print raw node 3: '(null)'
'I have no idea what I'
print raw node 2: 'm'
finish print raw node 2: 'm'
'I have no idea what Im'
finish print raw node 1: '(null)'
'I have no idea what Im'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no idea what Im '
print raw node 2: 'doing'
finish print raw node 2: 'doing'
'I have no idea what Im doing'
halfway('I have no idea what Im doing')
finish label from node_tree: 'ihavenoideawhatimdoing'

start label from node_tree
start print_raw_node_tree: '2'
print raw node 2: 'This'
finish print raw node 2: 'This'
'This'
print raw node 8: ' '
finish print raw node 8: ' '
'This '
print raw node 2: 'is'
finish print raw node 2: 'is'
'This is'
print raw node 8: ' '
finish print raw node 8: ' '
'This is '
print raw node 2: 'secondary'
finish print raw node 2: 'secondary'
'This is secondary'
halfway('This is secondary')
finish label from node_tree: 'thisissecondary'
print_text_node encountered unknown node key = 5
bdkjones commented 9 years ago

Also: every other output format compiles correctly. This problem occurs only when the output format is TEXT_FORMAT.

fletcher commented 9 years ago

Why are using TEXT_FORMAT? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit() strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit() appropriately grabs your attention.

If you want to use the TEXT_FORMAT for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)

bdkjones commented 9 years ago

Ha, fair enough. I was using the text format because it was listed in the enum as an option and there was no comment that says, "don't use this; it's incomplete". I also read the "known issues" page, but that does not make it clear that text_format is unusable-to-the-point-of-crashes; it just makes it sound like the output may not be 100% correct for that format. (plus, I think it only mentions RTF support as not yet done, it doesn't specifically call out the text format.)

Sent from my iPhone

On Apr 9, 2015, at 13:09, Fletcher T. Penney notifications@github.com wrote:

Why are using TEXT_FORMAT? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit() strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit() appropriately grabs your attention.

If you want to use the TEXT_FORMAT for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)

— Reply to this email directly or view it on GitHub.

bdkjones commented 9 years ago

I still maintain that a refactor of the API would be beneficial. Crashing assertions like this should not make it into shipping code. For example, suppose you ship an update someday with a bug in it that causes the parser to fail on a certain piece of syntax. Now, all GUI-based apps that run the framework in their own process will crash with no warning. Refactoring as I proposed gives them a chance to handle errors (rare as they may be) gracefully. I am still happy to do this and submit a pull request, if you'll merge. I'll comment everything very well so it's clear how the API works. I think you'll really like the result.

Sent from my iPhone

On Apr 9, 2015, at 13:09, Fletcher T. Penney notifications@github.com wrote:

Why are using TEXT_FORMAT? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit() strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit() appropriately grabs your attention.

If you want to use the TEXT_FORMAT for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)

— Reply to this email directly or view it on GitHub.

bdkjones commented 9 years ago

Put another way: YES, the places where you're calling exit() are irrecoverable errors for the multimarkdown compiler. But they may NOT be irrecoverable errors for the process that's calling the compiler. Thus, it would be far better for the compiler to present an error and allow the calling process to decide what to do. That's the bulletproof solution that makes the framework safe for use in all types of apps---command line utilities to GUIs.

Sent from my iPhone

On Apr 9, 2015, at 13:09, Fletcher T. Penney notifications@github.com wrote:

Why are using TEXT_FORMAT? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit() strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit() appropriately grabs your attention.

If you want to use the TEXT_FORMAT for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)

— Reply to this email directly or view it on GitHub.

fletcher commented 9 years ago

I understand what you're saying -- my point is that these errors should not occur in proper use of the library. If they're occurring, something is fundamentally wrong (or incomplete) and your code should not be used in a shipping product. It's not that the errors are irrecoverable for MMD, it's that they are a sign that something is "plugged in" wrong. (You'll notice that there is no mention of the TEXT_FORMAT in the --help page or user's guide -- it's basically useless code at this point that may potentially be useful in the future.)

Other errors that occur (e.g. edge cases that result in memory errors, etc.) do not trigger exit()'s. The code is designed so that it works regardless of input. You may not get the output you were expecting, but you get a valid string. There's no need for error passing, because there are no error messages to pass. If a given input string fails to parse, then there's an edge case that needs to be fixed by me, but there are no situations that require sending back an error message to the external code, just the result string/node tree/etc.

I continue to monitor for memory leaks, and have a growing suite of test files to look at known edge cases. The code designed for use is very stable, with the following exceptions:

  1. rtf format support is as is, and not guaranteed to work. I welcome submissions to expand it, but will do limited work on it myself (addressed at length elsewhere).
  2. lyx format support is by someone else, and is included "as is." It stays in my test suite, but I don't manage that code.
  3. "orphan" routines (such as the TEXT_FORMAT code you used) that are not documented to be user accessible -- these are typically utility functions that are no longer used but not pruned, or something that is experimental and may be useful in the future. In this case, use at your own risk, caveat emptor, or whatever.

I don't plan on making any changes to the exit() statements at this time. Who knows what the future holds.... .

bdkjones commented 9 years ago

Ok, cool. Just 4 questions then:

  1. When you say "RTF is not guaranteed to work", do you mean it's not guaranteed to generate the correct output or that it could produce a crash?
  2. I see that EXT_FILTER_HTML and EXT_FILTER_STYLES are not exposed in the CLI. Are these features currently experimental or can they be used safely?
  3. I see the EXT_CRITIC flag, but it's not used anywhere. What is the purpose of this flag? Should it be passed in certain situations, such as when I pass EXT_CRITIC_ACCEPT or EXT_CRITIC_REJECT?
  4. in export_formats there is CRITIC_ACCEPT_FORMAT along with the reject and highlight formats. These aren't exposed in the CLI. Are these used or planned for use in the future?
fletcher commented 9 years ago
  1. Using RTF could potentially turn your computer into a turtle. Said turtle may or may not be able to do long division. Anything else is also possible. Don't use in production environments.
  2. EXT_FILTER_HTML is left over from peg-markdown. It can probably be used from the library, but is not formally supported. Same goes for EXT_FILTER_STYLES.
  3. EXT_CRITIC can be used from other software to enable identifying CriticMarkup within a document. You probably don't need it, because it won't really do anything. I use it for other purposes in MultiMarkdown Composer.
  4. You'll have to follow the source code to understand how the various CriticMarkup flags are used. Or just stick with multimarkdown.c to see how to call them at a high level, which is how I would recommend starting. If you get to a point where you need more complexity and control, then you'll find that what you need is probably already implemented.
danielpunkass commented 9 years ago

Fascinating discussion. I tend to agree with @bdkjones that for the benefit of library-hosting apps being able to recover or treat failure with less severity than outright crashing, some other mechanism should be used. Perhaps exiting could be the default behavior, but the handling of "critical" errors like this could be somehow configured by the host app to facilitate e.g. throwing Objective C exceptions, instead.

danielpunkass commented 9 years ago

I added a pull request #134 to start documenting in the comments which of the listed export formats is suitable for mainstream use.