conig / revise

R package for writing revise and resubmits
Other
3 stars 2 forks source link

Limitations on []{#} text extraction #1

Closed conig closed 3 years ago

conig commented 3 years ago

[]{#} tagging currently works for most circumstances, including nested references.

nested <- "[This [text]{#inner} has nested references[@ref]]{#outer}"

revise:::extract_md_sections2(nested)
#>     tag                               section
#> 1 inner                                  text
#> 2 outer This text has nested references[@ref]

But will fail if square brackets are used for anything other than references

sic <- "This sentence [includes square brackets which aren't [sic] references]{#example}"

revise:::extract_md_sections2(sic)
#>       tag         section
#> 1 example sic] references

For these situations the workaround is to use span tagging until a system with proper parentheses matching is implemented.

pdparker commented 3 years ago

Seems to fail for me even with references where there is a prefix or a suffix.

conig commented 3 years ago

@pdparker If you give me a reproducible example I'll check it out.

pdparker commented 3 years ago

Modifying your example above:

nested <- "[This [text]{#inner} has nested references[-@ref]]{#outer}"

revise:::extract_md_sections2(nested)

or

nested <- "[This [text]{#inner} has nested references[e.g., @ref]]{#outer}"

revise:::extract_md_sections2(nested)
conig commented 3 years ago

I think solved in latest version

nested <- "[This [text]{#inner} has nested references[-@ref]]{#outer}"
revise:::extract_md_sections2(nested)
#>     tag                                section
#> 1 inner                                   text
#> 2 outer This text has nested references[-@ref]
nested <- "[This [text]{#inner} has nested references[e.g., @ref]]{#outer}"

revise:::extract_md_sections2(nested)
#>     tag                                     section
#> 1 inner                                        text
#> 2 outer This text has nested references[e.g., @ref]

I have expanded the regex for detecting whether something is a reference (rather than nested anchor tags). \[.*\[(?!.*\@) Now it is just [] including an @ somewhere within. Not sure how this is going to come back to bite us—pretty unusal to include @ in writing but who knows—let me know if any issues.

Might need to rethink the current setup.