Open bdkjones opened 9 years ago
These exit()
calls are placed in parts of the code that should not be reached (e.g. being asked to handle a node key
that doesn't exist.)
If you're encountering an error like this, there is a fundamental problem that needs to be fixed. These aren't just "the user typed a letter instead of a number" type errors that require resolution. These are "there's something broken in your program" type errors.
I've been using this as a library in MultiMarkdown Composer and have never had a the application "killed" by libMultiMarkdown in the years I have been doing this.
Under what situation are you encountering these problems?
I'm attempting to compile a very simple file to Text output. The message logged to stderr just before the app terminates is:
print_text_node encountered unknown node key = 5
Here's the contents of the Markdown file I'm compiling:
This is a markdown document.
### I have no idea what I'm doing
1. One
2. Two
3. Three
{{secondary.md}}
And here is the contents of the transcluded file, secondary.md:
## This is secondary
Well isn't this just fantastic?
I am passing the following flags to the compiler:
EXT_COMPLETE
EXT_NOTES
EXT_SMART
EXT_OBFUSCATE
The output format is TEXT_FORMAT
The compiling sequence goes like this:
//
// Read in the entire input file contents.
//
GString *inputBuffer = g_string_new("");
int currentChar = 0;
while ((currentChar = fgetc(inputFile)) != EOF) {
g_string_append_c(inputBuffer, currentChar);
}
fclose(inputFile);
//
// Process includes, unless we're in "compatibility mode"
//
if (!(optionsFlag & EXT_COMPATIBILITY))
{
prepend_mmd_header(inputBuffer);
append_mmd_footer(inputBuffer);
const char *inputFolderPath = [[aPath stringByDeletingLastPathComponent] cStringUsingEncoding:NSUTF8StringEncoding];
transclude_source(inputBuffer, (char *)inputFolderPath, NULL, outputFormat, NULL);
}
//
// Compile.
//
char *outputBuffer = NULL;
if (outputFormat == ORIGINAL_FORMAT)
{
// User wants original MD; do not compile.
outputBuffer = inputBuffer->str;
}
else
{
outputBuffer = markdown_to_string(inputBuffer->str, optionsFlag, outputFormat);
}
I followed your implementation in multimarkdown.c
. I am perfectly willing to acknowledge that the problem may be in my code.
However, I still think the parser/writers should not be able to terminate the host process. The host process should inspect the result of the compiler and decide how to react. In my case, that means showing an error message in the UI. In the case of multimarkdown.c
, that means terminating with an error exit code.
Check out the Libsass project for a great example of how this is done. (That's where I got the "contexts" idea from.)
I flipped DEBUG_ON
on and here's what got logged:
reverse_list: '.'
reverse_list: 'm'
reverse_list: 'doing'
reverse_list: 'One
'
reverse_list: 'One
'
reverse_list: 'Two
'
reverse_list: 'Two
'
reverse_list: 'Three
'
reverse_list: 'Three
'
reverse_list: 'secondary'
reverse_list: 't'
reverse_list: '?'
reverse_list: 'One'
reverse_list: 'Two'
reverse_list: 'Three'
export_node_tree
extract_references
start label from node_tree
start print_raw_node_tree: '2'
print raw node 2: 'I'
finish print raw node 2: 'I'
'I'
print raw node 8: ' '
finish print raw node 8: ' '
'I '
print raw node 2: 'have'
finish print raw node 2: 'have'
'I have'
print raw node 8: ' '
finish print raw node 8: ' '
'I have '
print raw node 2: 'no'
finish print raw node 2: 'no'
'I have no'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no '
print raw node 2: 'idea'
finish print raw node 2: 'idea'
'I have no idea'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no idea '
print raw node 2: 'what'
finish print raw node 2: 'what'
'I have no idea what'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no idea what '
print raw node children from 1
start print_raw_node_tree: '2'
print raw node 2: 'I'
finish print raw node 2: 'I'
'I have no idea what I'
print raw node children from 3
finish print raw node 3: '(null)'
'I have no idea what I'
print raw node 2: 'm'
finish print raw node 2: 'm'
'I have no idea what Im'
finish print raw node 1: '(null)'
'I have no idea what Im'
print raw node 8: ' '
finish print raw node 8: ' '
'I have no idea what Im '
print raw node 2: 'doing'
finish print raw node 2: 'doing'
'I have no idea what Im doing'
halfway('I have no idea what Im doing')
finish label from node_tree: 'ihavenoideawhatimdoing'
start label from node_tree
start print_raw_node_tree: '2'
print raw node 2: 'This'
finish print raw node 2: 'This'
'This'
print raw node 8: ' '
finish print raw node 8: ' '
'This '
print raw node 2: 'is'
finish print raw node 2: 'is'
'This is'
print raw node 8: ' '
finish print raw node 8: ' '
'This is '
print raw node 2: 'secondary'
finish print raw node 2: 'secondary'
'This is secondary'
halfway('This is secondary')
finish label from node_tree: 'thisissecondary'
print_text_node encountered unknown node key = 5
Also: every other output format compiles correctly. This problem occurs only when the output format is TEXT_FORMAT
.
Why are using TEXT_FORMAT
? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit()
strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit()
appropriately grabs your attention.
If you want to use the TEXT_FORMAT
for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)
Ha, fair enough. I was using the text format because it was listed in the enum as an option and there was no comment that says, "don't use this; it's incomplete". I also read the "known issues" page, but that does not make it clear that text_format is unusable-to-the-point-of-crashes; it just makes it sound like the output may not be 100% correct for that format. (plus, I think it only mentions RTF support as not yet done, it doesn't specifically call out the text format.)
Sent from my iPhone
On Apr 9, 2015, at 13:09, Fletcher T. Penney notifications@github.com wrote:
Why are using TEXT_FORMAT? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit() strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit() appropriately grabs your attention.
If you want to use the TEXT_FORMAT for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)
— Reply to this email directly or view it on GitHub.
I still maintain that a refactor of the API would be beneficial. Crashing assertions like this should not make it into shipping code. For example, suppose you ship an update someday with a bug in it that causes the parser to fail on a certain piece of syntax. Now, all GUI-based apps that run the framework in their own process will crash with no warning. Refactoring as I proposed gives them a chance to handle errors (rare as they may be) gracefully. I am still happy to do this and submit a pull request, if you'll merge. I'll comment everything very well so it's clear how the API works. I think you'll really like the result.
Sent from my iPhone
On Apr 9, 2015, at 13:09, Fletcher T. Penney notifications@github.com wrote:
Why are using TEXT_FORMAT? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit() strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit() appropriately grabs your attention.
If you want to use the TEXT_FORMAT for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)
— Reply to this email directly or view it on GitHub.
Put another way: YES, the places where you're calling exit() are irrecoverable errors for the multimarkdown compiler. But they may NOT be irrecoverable errors for the process that's calling the compiler. Thus, it would be far better for the compiler to present an error and allow the calling process to decide what to do. That's the bulletproof solution that makes the framework safe for use in all types of apps---command line utilities to GUIs.
Sent from my iPhone
On Apr 9, 2015, at 13:09, Fletcher T. Penney notifications@github.com wrote:
Why are using TEXT_FORMAT? Support for that is incomplete, and really was mainly there for some other stuff I need to work on. This is a perfect example of why I like the exit() strategy. You are using code that probably shouldn't be used and isn't properly functional, and the exit() appropriately grabs your attention.
If you want to use the TEXT_FORMAT for something, it just needs to be completed to support all appropriate node types, but more importantly -- what are you trying to accomplish? There is probably a better way... ("I do not think it means what you think it means", to quote a fantastic movie...)
— Reply to this email directly or view it on GitHub.
I understand what you're saying -- my point is that these errors should not occur in proper use of the library. If they're occurring, something is fundamentally wrong (or incomplete) and your code should not be used in a shipping product. It's not that the errors are irrecoverable for MMD, it's that they are a sign that something is "plugged in" wrong. (You'll notice that there is no mention of the TEXT_FORMAT
in the --help
page or user's guide -- it's basically useless code at this point that may potentially be useful in the future.)
Other errors that occur (e.g. edge cases that result in memory errors, etc.) do not trigger exit()
's. The code is designed so that it works regardless of input. You may not get the output you were expecting, but you get a valid string. There's no need for error passing, because there are no error messages to pass. If a given input string fails to parse, then there's an edge case that needs to be fixed by me, but there are no situations that require sending back an error message to the external code, just the result string/node tree/etc.
I continue to monitor for memory leaks, and have a growing suite of test files to look at known edge cases. The code designed for use is very stable, with the following exceptions:
rtf
format support is as is, and not guaranteed to work. I welcome submissions to expand it, but will do limited work on it myself (addressed at length elsewhere).lyx
format support is by someone else, and is included "as is." It stays in my test suite, but I don't manage that code.TEXT_FORMAT
code you used) that are not documented to be user accessible -- these are typically utility functions that are no longer used but not pruned, or something that is experimental and may be useful in the future. In this case, use at your own risk, caveat emptor, or whatever.I don't plan on making any changes to the exit()
statements at this time. Who knows what the future holds....
.
Ok, cool. Just 4 questions then:
EXT_FILTER_HTML
and EXT_FILTER_STYLES
are not exposed in the CLI. Are these features currently experimental or can they be used safely?EXT_CRITIC
flag, but it's not used anywhere. What is the purpose of this flag? Should it be passed in certain situations, such as when I pass EXT_CRITIC_ACCEPT
or EXT_CRITIC_REJECT
?export_formats
there is CRITIC_ACCEPT_FORMAT
along with the reject and highlight formats. These aren't exposed in the CLI. Are these used or planned for use in the future?EXT_FILTER_HTML
is left over from peg-markdown. It can probably be used from the library, but is not formally supported. Same goes for EXT_FILTER_STYLES
.EXT_CRITIC
can be used from other software to enable identifying CriticMarkup within a document. You probably don't need it, because it won't really do anything. I use it for other purposes in MultiMarkdown Composer.Fascinating discussion. I tend to agree with @bdkjones that for the benefit of library-hosting apps being able to recover or treat failure with less severity than outright crashing, some other mechanism should be used. Perhaps exiting could be the default behavior, but the handling of "critical" errors like this could be somehow configured by the host app to facilitate e.g. throwing Objective C exceptions, instead.
I added a pull request #134 to start documenting in the comments which of the listed export formats is suitable for mainstream use.
I'm incorporating MMD into a GUI on OS X. To do this, I'm using libMultiMarkdown and calling the various functions directly (as opposed to calling the compiled mmd binary on a separate process).
The Problem
Whenever the parser or one of the writers (odf.c, opml.c, etc.) encounters an error, you call
exit(EXIT_FAILURE)
, which immediately terminates my entire app. This is obviously not what we want.The Solution
The only thing that should be calling
exit()
ismultimarkdown.c
, which is the file that compiles into the command line utility. The rest of the project should never callexit()
directly. Instead, the various functions in the rest of the project should include an error parameter. That way, when an implementor attempts to use libMultiMarkdown in their project, the framework is agnostic—it won't assume that it's simply running as a separate process where it's fine to kill everything.Proposals
The main function currently looks like this:
One option would be to add an error parameter, like this:
If
error
is not NULL, you know something went wrong and can handle it appropriately. The parameter would be populated with the messages you currently spit to stderr just before callingexit()
. This is the option with the least amount of major refactoring.A better option would be to create a set of "context" structs, like this:
Now, you create an
mmd_context
and set the first three parameters. Then, instead of callingmarkdown_to_string()
, you'd call something like:This would create the
mmd_result_context
as a child of the originalmmd_context
. You could inspect the contents of the result context to act appropriately. This is the much more robust solution.Need a Go-Ahead
I'm willing to refactor MMD to use either of these approaches, but I'd like to know that if I do so and submit a pull request, you'll merge it. Otherwise, I'd have to re-do all this work every time you update the project in the future. I don't want to invest all the time without knowing up-front if this is something you'll approve. Thanks!