Closed jimvine closed 5 years ago
Thanks for documenting that so thoroughly! Looks like I need to the make the YAML-excluding function a bit more specific. Currently I just have gsub("---.*--- ", "", text)
, which is obviously a problem if we have three dashes anywhere else in the document.
I will think a bit about how to change this to be more specific to YAML so it wont notice dashes in the text. If you have any suggestions, please let me know!
Whenever I've seen YAML blocks they always seem to have the three dashes on lines on their own, so that might be the trick. Perhaps this regex I found in a Gist might provide some hints:
(?s)^(---)$.+?^(---)$.+?(?=^---$)
I'm not an expert on regex, but reading the explanation of it, I think you might just need to have the first few bits of it:
(?s)^(---)$.+?^(---)$
The Pandoc manual says:
A YAML metadata block is a valid YAML object, delimited by a line of three hyphens (---) at the top and a line of three hyphens (---) or three dots (...) at the bottom. A YAML metadata block may occur anywhere in the document, but if it is not at the beginning, it must be preceded by a blank line. https://pandoc.org/MANUAL.html#extension-yaml_metadata_block
So perhaps technically your regex ought to be able to find three dots closing a YAML block as well as three dashes, though I suspect that's pretty uncommon to find in Rmarkdown documents in the wild.
I think I might have dealt with this in #28
I was getting unexpected low word counts on my document. After some investigation is seems to be that if there are two or more em dashes (which I enter using three short dashes "---"), any words between the first and last occurrences in document are not counted. I suspect that this might be because anything within them is excluded as if they are YAML.
Example:
Gives this:
It gets worse if I select the whole document before running the addin (i.e., including the YAML header:
This gives:
So it looks like they are just picking up the last "libero" after the final em dash.