Closed keehun closed 3 years ago
A cmark_node
has fields start_line
, start_column
, end_line
, end_column
.
Be sure to set CMARK_OPT_SOURCEPOS in options when parsing.
Thank you. I totally missed that!
Am I reading it right that the column it's giving are the number of bytes it has read on that line? For example, for a line containing # π¨βπ©βπ§βπ§
, instead of giving start column of 1 and end column of 3, it gives start column 1 and end column 27 which is exactly the number of bytes for # π¨βπ©βπ§βπ§
.
I'm guessing there's no recognition of grapheme clusters and a configuration to count the number of "characters" which may span many bytes?
I think you're right that it is counting bytes. I haven't looked at the code for some time. Obviously, this isn't ideal for all purposes, but it uses a number cmark has to keep track of anyway.
The more I think about it, the more it makes sense that cmark
remains neutral to the different ways that different languages (and their standard libraries) count the "length" of a string. Bytes is the most fundamental/"unbiased" measurement. It just makes my job a little bit harder π
I am quite new to the
cmark
community, but after initially digging around, it doesn't seem likecmark
nodes have a reference to what "character ranges" from the input text was responsible for the node.For example, if I have this input string,
Hello **world**
, I'd want "range: 6...13" attached to theStrong
node.One use for this information would be to be able to "descriptively" parse markdown without rendering it into another form in a lossy process. For example, maybe I want to just get the
Strong
nodes and apply a particular style onto the markdown source.My first approach was to take the nodes and "reconstruct" the markdown source from them, but this process is not robust and has too much room for error. The loss of the markdown formatting characters is too much. If I could get the range for a node, I could keep the original markdown source and decorate that in an additive method.
Is this something that is even feasible based on the architecture of
cmark
? I intend on continuing to look for myself, but I figured I'd ask in case anyone has thought about this already.Thank you