doxygen / doxygen

Official doxygen git repository
https://www.doxygen.org
GNU General Public License v2.0
5.68k stars 1.27k forks source link

Markdown: Links in ATX headings processed incorrectly? #6998

Open jschleus opened 5 years ago

jschleus commented 5 years ago

At least if one uses the tag USE_MDFILE_AS_MAINPAGE = README.md I found that in such a markdown file links in ATX-style headers seem to be processed incorrectly.

For e.g. the following README.md file

# A first ATX heading level 1 ("mis"-used as "headertitle")
# A ATX heading (level 1) with a link [Doxygen](http://www.doxygen.nl/) 
A standard text with a link [Doxygen](http://www.doxygen.nl/)

leads to the following HTML output

image

The link in the standard text, however, is processed correctly.

I assume alll markdown files will be affected (but not tested).

doxygen commented 5 years ago

Links in section headers are not supported at the moment, only plain text is supported. This is because of the simple way section headers are currently parsed (the title is a single text attribute) and also because markup in section headers does not translate well to other output formats (LaTeX, RTF).

wataash commented 4 years ago

One of the workaround is using HTML tags.

## [example.com](http://example.com)
## _foo_ **bar** `baz`

Using HTML tags:

<h2>[example.com](http://example.com)</h2>
<h2> _foo_ **bar** `baz`</h2>
    ^ space here must be needed (#7781)

image


Or, apply this patch (this includes a part of #7782):

diff --git a/src/markdown.cpp b/src/markdown.cpp
index 937e5e0d..6ffaa923 100644
--- a/src/markdown.cpp
+++ b/src/markdown.cpp
@@ -70,8 +70,8 @@
 // is character at position i in data allowed before an emphasis section
 #define isOpenEmphChar(i) \
   (data[i]=='\n' || data[i]==' ' || data[i]=='\'' || data[i]=='<' || \
-   data[i]=='{'  || data[i]=='(' || data[i]=='['  || data[i]==',' || \
-   data[i]==':'  || data[i]==';')
+   data[i]=='>'  || data[i]=='{' || data[i]=='('  || data[i]=='[' || \
+   data[i]==','  || data[i]==':' || data[i]==';')

 // is character at position i in data an escape that prevents ending an emphasis section
 // so for example *bla (*.txt) is cool*
@@ -1967,7 +1967,8 @@ void writeOneLineHeaderOrRuler(GrowBuf &out,const char *data,int size)
   else if ((level=isAtxHeader(data,size,header,id,TRUE)))
   {
     QCString hTag;
-    if (level<5 && !id.isEmpty())
+    // if (level<5 && !id.isEmpty())
+    if (false)
     {
       switch(level)
       {

image

Note that this if (false) makes \tableofcontents empty, and there's gotta be other problems. So use with caution (I recommend comparing the results with diff -r html.original/ html.applied-this-patch/).

albert-github commented 4 years ago

The bottom part of the comment is not in the pull request, but it looks like a "debug" test. Is the later assumption correct?

wataash commented 4 years ago

The bottom part of the comment is not in the pull request

Exactly.

but it looks like a "debug" test. Is the later assumption correct?

I'm sorry, I couldn't understand what this means... Maybe there're differences between our thoughts.

Let's take an example:

# Header 1 _foo_ [example.com](http://example.com)
## Header 2 _foo_ [example.com](http://example.com)
### Header 3 _foo_ [example.com](http://example.com)
#### Header 4 _foo_ [example.com](http://example.com)
##### Header 5 _foo_ [example.com](http://example.com)
###### Header 6 _foo_ [example.com](http://example.com)
####### Header 7 _foo_ [example.com](http://example.com)
######## Header 8 _foo_ [example.com](http://example.com)
######### Header 9 _foo_ [example.com](http://example.com)

processMarkdown() convertes it to:

@page md_055_markdown Header 1 <em>foo</em> <a href="http://example.com">example.com</a>
@section autotoc_md1 Header 2 <em>foo</em> <a href="http://example.com">example.com</a>
@subsection autotoc_md2 Header 3 <em>foo</em> <a href="http://example.com">example.com</a>
@subsubsection autotoc_md3 Header 4 <em>foo</em> <a href="http://example.com">example.com</a>
@paragraph autotoc_md4 Header 5 <em>foo</em> <a href="http://example.com">example.com</a>
<h5>Header 6 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5># Header 7 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>## Header 8 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>### Header 9 <em>foo</em> <a href="http://example.com">example.com</a></h5>

and it will be:

<div class="PageDoc"><div class="header">
  <div class="headertitle">
<div class="title">Header 1 <em>foo</em> <a href="http://example.com">example.com</a> </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><h1><a class="anchor" id="autotoc_md1"></a>
Header 2 &lt;em&gt;foo&lt;/em&gt; &lt;a href="http://example.com"&gt;example.com&lt;/a&gt;</h1>
<h2><a class="anchor" id="autotoc_md2"></a>
Header 3 &lt;em&gt;foo&lt;/em&gt; &lt;a href="http://example.com"&gt;example.com&lt;/a&gt;</h2>
<h3><a class="anchor" id="autotoc_md3"></a>
Header 4 &lt;em&gt;foo&lt;/em&gt; &lt;a href="http://example.com"&gt;example.com&lt;/a&gt;</h3>
<h4><a class="anchor" id="autotoc_md4"></a>
Header 5 &lt;em&gt;foo&lt;/em&gt; &lt;a href="http://example.com"&gt;example.com&lt;/a&gt;</h4>
<h5>Header 6 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5># Header 7 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>## Header 8 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>### Header 9 <em>foo</em> <a href="http://example.com">example.com</a></h5>
</div></div><!-- contents -->
</div><!-- PageDoc -->

where headers 2-5 seem not to be well rendered. (I just realized the h1 is exceptionally pretty, but @jschleus's example is of h1. I don't know the reason why)

With this diff:

@@ -1967,7 +1967,8 @@ void writeOneLineHeaderOrRuler(GrowBuf &out,const char *data,int size)
   else if ((level=isAtxHeader(data,size,header,id,TRUE)))
   {
     QCString hTag;
-    if (level<5 && !id.isEmpty())
+    // if (level<5 && !id.isEmpty())
+    if (false)
     {
       switch(level)
       {

it prepends headers 2-5 from being turned into Doxygen commands (@section @subsection @subsubsection @paragraph), so the rendering result will be nice.

@page md_055_markdown Header 1 <em>foo</em> <a href="http://example.com">example.com</a>
\anchor autotoc_md1
<h1>Header 2 <em>foo</em> <a href="http://example.com">example.com</a></h1>
\anchor autotoc_md2
<h2>Header 3 <em>foo</em> <a href="http://example.com">example.com</a></h2>
\anchor autotoc_md3
<h3>Header 4 <em>foo</em> <a href="http://example.com">example.com</a></h3>
\anchor autotoc_md4
<h4>Header 5 <em>foo</em> <a href="http://example.com">example.com</a></h4>
<h5>Header 6 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5># Header 7 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>## Header 8 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>### Header 9 <em>foo</em> <a href="http://example.com">example.com</a></h5>
</div><!-- top -->
<div class="PageDoc"><div class="header">
  <div class="headertitle">
<div class="title">Header 1 <em>foo</em> <a href="http://example.com">example.com</a> </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p><a class="anchor" id="autotoc_md1"></a></p><h1>Header 2 <em>foo</em> <a href="http://example.com">example.com</a></h1>
<p><a class="anchor" id="autotoc_md2"></a></p><h2>Header 3 <em>foo</em> <a href="http://example.com">example.com</a></h2>
<p><a class="anchor" id="autotoc_md3"></a></p><h3>Header 4 <em>foo</em> <a href="http://example.com">example.com</a></h3>
<p><a class="anchor" id="autotoc_md4"></a></p><h4>Header 5 <em>foo</em> <a href="http://example.com">example.com</a></h4>
<h5>Header 6 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5># Header 7 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>## Header 8 <em>foo</em> <a href="http://example.com">example.com</a></h5>
<h5>### Header 9 <em>foo</em> <a href="http://example.com">example.com</a></h5>
</div></div><!-- contents -->
</div><!-- PageDoc -->
albert-github commented 4 years ago

@wataash Did you have a look at what happens when you add \tableofcontents (and in the e.g. LaTeX / PDF output) ?

wataash commented 4 years ago

Yes ToC (and possibly several other things) becomes impossible, so this is just a workaround. Thank you for pointing it out. I'll add a caution to the upper comment.

lemoncmd commented 5 months ago

so, the core problem is that @section cannot parse HTML tags as @page ?