getnikola / plugins

Extra plugins for Nikola
https://plugins.getnikola.com/
MIT License
59 stars 95 forks source link

rest_html5: code blocks indented incorrectly #152

Closed Netspider closed 8 years ago

Netspider commented 8 years ago

after changing to the rest_html5 plugin I noticed wrong indentation for code blocks:

.. code:: bash

        # this is at the beginning of the line
        # this is indented by 4 spaces
        # and this line is also indented by 4 spaces

this snippet looks like this in the html4 rest parser:

# this is at the beginning of the line
# this is indented by 4 spaces
# and this line is also indented by 4 spaces
    <div>
<p>this is a test.</p>
 <pre class="code bash"><a name="rest_code_d094cf7088ce4e0a8b54a564ecf28f16-1"></a><span class="c"># this is at the beginning of the line</span>
<a name="rest_code_d094cf7088ce4e0a8b54a564ecf28f16-2"></a><span class="c"># this is indented by 4 spaces</span>
<a name="rest_code_d094cf7088ce4e0a8b54a564ecf28f16-3"></a><span class="c"># and this line is also indented</span>
</pre>
</div>

with the rest_html5 parser it looks like this:

# this is at the beginning of the line
    # this is indented by 4 spaces
    # and this line is also indented by 4 spaces
    <div>
<p>this is a test</p>
     <pre class="code bash"><a name="rest_code_da54fb11d97c4525807bff948fe2f50e-1"></a><span class="c"># this is at the beginning of the line</span>
    <a name="rest_code_da54fb11d97c4525807bff948fe2f50e-2"></a><span class="c"># this is indented by 4 spaces</span>
    <a name="rest_code_da54fb11d97c4525807bff948fe2f50e-3"></a><span class="c"># and this line is also indented</span>
    </pre>
</div>
Kwpolska commented 8 years ago

Not our bug. Please make sure you’re running an up-to-date version of rst2html5 (pip install -U rst2html5), delete the cache/ file for this post, and if the issue persists, report a bug upstream.

Kwpolska commented 8 years ago

Turns out we can (and should) actually fix this, which will happen in a minute or two.

andredias commented 8 years ago

Hi, I'm the author and mantainer of rst2html5 (https://pypi.io/project/rst2html5/). The project is hosted at https://bitbucket.org/andre_felipe_dias/rst2html5 and I'd like to help to fix this issue.

andredias commented 8 years ago

I couldn't replicate the issue. It looks like it is resolved already.

Kwpolska commented 8 years ago

@andredias I worked around it, but partially:

We’re using a custom code block — it works fine with docutils, but with rst2html5 [(plugin)](, there are indentation issues: getnikola/plugins#152

Unfortunately my patch does not cut it, because that breaks tabs (U+0009).

Here’s a gist with a sample post and differences: https://gist.github.com/Kwpolska/a2d4268a08b60df895180f5aa6fd5513

I have tracked the issue down to a difference in self.content for our CodeBlock directive:

.  render_posts:cache/posts/index4.html
N. ['sudo vim /etc/lightdm/lightdm-gtk-greeter.conf']
N. ['[base]', 'session=/usr/bin/startlxde', '...', '[userlist]', 'disable=1']
N. ['foo', 'bar', '             baz (this indented with 3 spaces and 2 tabs)', 'foobar', '    foobaz']
.  render_posts:cache/posts/index.html
N. ['sudo vim /etc/lightdm/lightdm-gtk-greeter.conf']
N. ['[base]', 'session=/usr/bin/startlxde', '...', '[userlist]', 'disable=1']
N. ['foo', 'bar', 'baz (this indented with 3 spaces and 2 tabs)', 'foobar', '    foobaz']

As you can see, we’re missing the indentation of baz. Is this a bug on our side or is this an issue with how rst2html5 parses things?

andredias commented 8 years ago

Neither rst2html.py nor rst2html5 uses tabs in pre (code) output. Both replaces tabs with a certain number of spaces defined by in tab-width directive. Actually, the parser does that, not the writers, so there is nothing one can do about that. That said, using 0 for tab-width breaks tabs as you noticed. I suggest turning it back to 4 and looking for a solution elsewhere.

Since the snippet given earlier by @Netspider works fine with both rst2html.py and rst2html5, it looks like the bug could be in Nikola's code-block directive. Does it mixes tabs and spaces somehow? Mixing tabs and spaces is as bad in restructuredText as it is in Python.

I'd like to point out that rst2html5 has its own code-block directive since v1.7. Maybe it would be better use that one.

Kwpolska commented 8 years ago

Mixing tabs and spaces is as bad in restructuredText as it is in Python.

Well, that’s a great reason to say our current thing is good enough.

That said, using 0 for tab-width breaks tabs as you noticed. I suggest turning it back to 4 and looking for a solution elsewhere.

Okay, what other solution should we use? The root of the problem is rst2html5 outputs everything indented by 4 spaces by default.

andredias commented 8 years ago

rst2html5's output indentation is intended to produce a html5 code that is easier to read. You could use the option --no-indent if you like, but it has nothing to do with code-blocks.

Let me rephrase some of my previous findings: when you use the snippet directly with rst2html.py (docutils) or rst2html5, the output is correct:

$ echo '.. code:: bash

    # this is at the beginning of the line
    # this is indented by 4 spaces
    # and this line is also indented by 4 spaces' > code.rst

$ rst2html.py code.rst
$ rst2html5 code.rst

The rst2html.py output is:

...
<pre class="code bash literal-block">
<span class="comment single"># this is at the beginning of the line
# this is indented by 4 spaces
# and this line is also indented by 4 spaces</span>
</pre>

The rst2html5 output is:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
</head>
<body>
    <pre class="code bash"><span class="c"># this is at the beginning of the line
# this is indented by 4 spaces
# and this line is also indented by 4 spaces</span></pre>
</body>
</html>

Actually, it should have been <pre data-language="bash">... but indentation is correct. I'll take at this right now.

Note that since outputs are both correct, the problem certainly is elsewhere.

I'm not blaming anyone. Besides I want to make it clear that I do use Nikola at my own blog (https://blog.pronus.io) and I have a great interest to make rst2html5 work fine with Nikola. That said, my further line of investigation will be:

  1. Does this indentation problem happen only in code-blocks?
  2. What is exactly the source code before Nikola's code-block directive and after?
  3. What happen if I use the rst2html5's default code-block?
  4. Pygments' output changed a little after version 1.6, could it be this?
Kwpolska commented 8 years ago
  1. Everywhere, but it’s not important outside of code blocks (HTML doesn’t care)
  2. See below.
  3. See below.
  4. No. rst2html5 really wants to indent things, and it doesn’t know our code blocks are code blocks.

Without Nikola code blocks (rest plugin unreachable): standard rst2html5 behavior


    <p>Write your post here.</p>
    <section id="header">
        <h1>Header</h1>
        <p>Foobar.</p>
        <pre class="code bash"><span class="c"># this is at the beginning of the line
# this is indented by 4 spaces
# and this line is also indented by 4 spaces</span></pre>
        <p>Another.</p>
        <pre class="code bash"><span class="c"># this is at the beginning of the line
# this is indented by 4 spaces
# and this line is also indented by 4 spaces</span></pre>
    </section>

With Nikola code blocks: all but the first line have extra indent


    <p>Write your post here.</p>
    <section id="header">
        <h1>Header</h1>
        <p>Foobar.</p>
        <pre class="code bash"><a name="rest_code_fa1f227a0edc4c64a17b7aeb16be17c8-1"></a><span class="c"># this is at the beginning of the line</span>
        <a name="rest_code_fa1f227a0edc4c64a17b7aeb16be17c8-2"></a><span class="c"># this is indented by 4 spaces</span>
        <a name="rest_code_fa1f227a0edc4c64a17b7aeb16be17c8-3"></a><span class="c"># and this line is also indented by 4 spaces</span>
        </pre>
        <p>Another.</p>
        <pre class="code bash"><a name="rest_code_4ada58ebed99409b946b06d8f2a91ec4-1"></a><span class="c"># this is at the beginning of the line</span>
        <a name="rest_code_4ada58ebed99409b946b06d8f2a91ec4-2"></a><span class="c"># this is indented by 4 spaces</span>
        <a name="rest_code_4ada58ebed99409b946b06d8f2a91ec4-3"></a><span class="c"># and this line is also indented by 4 spaces</span>
        </pre>
    </section>

With --no-indent equivalent: all code appears on one line

<p>Write your post here.</p><section id="header"><h1>Header</h1><p>Foobar.</p><pre class="code bash"><a name="rest_code_1fd4ac2446d34833b676a164575b852e-1"></a><span class="c"># this is at the beginning of the line</span><a name="rest_code_1fd4ac2446d34833b676a164575b852e-2"></a><span class="c"># this is indented by 4 spaces</span><a name="rest_code_1fd4ac2446d34833b676a164575b852e-3"></a><span class="c"># and this line is also indented by 4 spaces</span></pre><p>Another.</p><pre class="code bash"><a name="rest_code_e5c59655af9a4570b654f04ce7c5b2ae-1"></a><span class="c"># this is at the beginning of the line</span><a name="rest_code_e5c59655af9a4570b654f04ce7c5b2ae-2"></a><span class="c"># this is indented by 4 spaces</span><a name="rest_code_e5c59655af9a4570b654f04ce7c5b2ae-3"></a><span class="c"># and this line is also indented by 4 spaces</span></pre></section>

With tab width set to 0: works (we currently use that)


<p>Write your post here.</p>
<section id="header">
<h1>Header</h1>
<p>Foobar.</p>
<pre class="code bash"><a name="rest_code_94a2c8897e5146ba8465c86862a81671-1"></a><span class="c"># this is at the beginning of the line</span>
<a name="rest_code_94a2c8897e5146ba8465c86862a81671-2"></a><span class="c"># this is indented by 4 spaces</span>
<a name="rest_code_94a2c8897e5146ba8465c86862a81671-3"></a><span class="c"># and this line is also indented by 4 spaces</span>
</pre>
<p>Another.</p>
<pre class="code bash"><a name="rest_code_da3a08f82f2b45028b3468c450e043ff-1"></a><span class="c"># this is at the beginning of the line</span>
<a name="rest_code_da3a08f82f2b45028b3468c450e043ff-2"></a><span class="c"># this is indented by 4 spaces</span>
<a name="rest_code_da3a08f82f2b45028b3468c450e043ff-3"></a><span class="c"># and this line is also indented by 4 spaces</span>
</pre>
</section>
andredias commented 8 years ago

ok. Now I get it. The problem is that Nikola's CodeBlock directive returns a raw node with html text content (https://github.com/getnikola/nikola/blob/master/nikola/plugins/compile/rest/listing.py#L113). This is not correct, sorry. It should have returned a literal_block node instead as sphinx does (https://github.com/sphinx-doc/sphinx/blob/master/sphinx/directives/code.py#L119). I've copied their code-block directive in rst2html5. I suggest you to do the same or just disable Nikola's code-block directive for rst2html5.

Kwpolska commented 8 years ago

Okay, would you mind patching our listing directive to work with literal_block then? Because if I just do it the naïve way, it breaks:

<p>Write your post here.</p>
<section id="header">
<h1>Header</h1>
<p>Foobar.</p>
<pre>&lt;pre class="code bash"&gt;&lt;a name="rest_code_c55578bfc35343ed9f2d5d33218ae9fa-1"&gt;&lt;/a&gt;&lt;span class="c"&gt;# this is at the beginning of the line&lt;/span&gt;
&lt;a name="rest_code_c55578bfc35343ed9f2d5d33218ae9fa-2"&gt;&lt;/a&gt;&lt;span class="c"&gt;# this is indented by 4 spaces&lt;/span&gt;
&lt;a name="rest_code_c55578bfc35343ed9f2d5d33218ae9fa-3"&gt;&lt;/a&gt;&lt;span class="c"&gt;# and this line is also indented by 4 spaces&lt;/span&gt;
&lt;/pre&gt;</pre>
<p>Another.</p>
<pre>&lt;pre class="code bash"&gt;&lt;a name="rest_code_a483f8a784754812910742f67fe9ab36-1"&gt;&lt;/a&gt;&lt;span class="c"&gt;# this is at the beginning of the line&lt;/span&gt;
&lt;a name="rest_code_a483f8a784754812910742f67fe9ab36-2"&gt;&lt;/a&gt;&lt;span class="c"&gt;# this is indented by 4 spaces&lt;/span&gt;
&lt;a name="rest_code_a483f8a784754812910742f67fe9ab36-3"&gt;&lt;/a&gt;&lt;span class="c"&gt;# and this line is also indented by 4 spaces&lt;/span&gt;
&lt;/pre&gt;</pre>
</section>
andredias commented 8 years ago

I'll try.

andredias commented 8 years ago

The original Nikola's code-block directive proved to work just fine for rst2html. Why don't keep it this way for rst2html and let rst2html5 use its own code-block directive? Thus, instead of changing the directive, we would change the time of the registration of the directives so one doesn't cover up another.

andredias commented 8 years ago

studying the matter further, I realize that there won't be a single code-block directive that will fit both rst2html and rst2html5 at the same time because their different ways of processing.

Kwpolska commented 8 years ago

We’d like to keep support for listings and linking to code lines in rst2html5, too. Especially since disabling (unnecessary) indents fixes this bug.

andredias commented 8 years ago

Good point. Please, give me a couple of days to work something out.

andredias commented 8 years ago

I've tried some alternatives but the simplest solution was rst2html5 not change keep raw html indentation. Please, revert your last commit 356b40d18e29 ("Use literal_block for Nikola code blocks") and update rst2html5 to version 1.8.1. It should work then.

Kwpolska commented 8 years ago

Commit’s already gone, and things are fixed once and for all. All fixed in 78430c5.