commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.62k stars 538 forks source link

Surfacing the list bullet marker type #225

Open nathanaschbacher opened 7 years ago

nathanaschbacher commented 7 years ago

Howdy,

Currently ordered lists expose which delimiter, . or ), was used when defining a list. I'd like to add functionality which does something similar for unordered lists. Such that the bullet-type is made available to the AST and/or output. So something consuming the AST or output could check for something like plus, hyphen, or asterisk when +, -, or * was used to construct the list.

Is that kind of functionality something that would be accepted upstream? I'm asking this first, because instead of adding this to cmark I could instead create an entirely separate pre-processor tool for our use case, but that would be non-ideal compared to adding support to cmark. Especially given that adding this would add balance to the current discrepancy in available AST detail between ordered and unordered lists.

If it is something that would have a chance at being accepted upstream, then there are two ways of implementing this that make sense to me.

  1. Extending the existing cmark_delim_type in cmark.h to include additional delimiters for bullet-type, but that then makes this type cover more than its original intended function, which was clearly intended only for ordered lists.

  2. Creating an entirely new type such as cmark_bullet_type in cmark.h to represent the different unordered list bullet types, but then this adds a whole new data type to cmark.

Either of these methods will work, but I want to make sure the solution I pursue fits in with how the cmark maintainers would prefer the extra context/metadata be captured and represented.

Any guidance on any of the above would be very helpful.

Thank you.

nathanaschbacher commented 7 years ago

Having now partially implemented both options 1 and 2 above I have some thoughts about their respective impacts.

Option 1: Fits in with the cmark project structure and architecture of the parser and various output format generators as they exist today.

Option 2: Seems more logically intuitive in the abstract, but is a bit more baroque when actually trying to integrate the change in cmark.

As a result I'm currently more inclined to go for Option 1 unless otherwise influenced.

nathanaschbacher commented 7 years ago

I've nearly got a PR ready for Option 1. All tests pass and I added an additional "api" test to the suite. I've also updated the racket-lang wrapper and validated that when doing -t commonmark the bullet character is preserved.

The only oddity is the CMARK_NO_DELIM symbol that was already present in cmark_delim_type. It's now not necessary, but was previously being used as a kind of default value to signal a bulletted list in some scenarios.

Probably the proper way to handle this would be to remove CMARK_NO_DELIM from cmark_delim_type entirely, and locate all the places which check against it or return it and instead check against the list type or return a more specific status, respectively.

I'll see how invasive that looks.

nathanaschbacher commented 7 years ago

Per the usual... upon completing my first go-around on it I found a much simpler way to implement this using the data that's already loaded into the cmark_list struct.

The simpler method ameliorates the need for any changes to cmark_delim_type because I can just expose the bullet_char component of the cmark_list struct.