LukeEmmet / duckling-proxy

Duckling proxy 🦆 is a Gemini proxy to access the Small Web
MIT License
34 stars 5 forks source link

Extra newlines messing up lists in Gemtext #12

Open acidus99 opened 2 years ago

acidus99 commented 2 years ago

I noticed something in how Duckling renders lists which are causing the contents on a <li> to be on a separate line in the gem text output than the *.

This is a page that Duckling rendered well (once you get past the initial nav stuff) https://perldoc.perl.org/perlpacktut

Look at the "Integers" section: Here is the HTML

<h2 id="Integers"><a class="permalink" href="#Integers">#</a>Integers</h2>

<p>Packing and unpacking numbers implies conversion to and from some <i>specific</i> binary representation. Leaving floating point numbers aside for the moment, the salient properties of any such representation are:</p>

<ul>

<li><p>the number of bytes used for storing the integer,</p>

</li>
<li><p>whether the contents are interpreted as a signed or unsigned number,</p>

</li>
<li><p>the byte ordering: whether the first byte is the least or most significant byte (or: little-endian or big-endian, respectively).</p>

</li>
</ul>

<p>So, for instance, to pack 20302 to a signed 16 bit integer in your computer&#39;s representation you write</p>

Here is how Duckling renders that:

image

But Look at the Gemtext:

## # Integers

Packing and unpacking numbers implies conversion to and from some specific binary representation. Leaving floating point numbers aside for the moment, the salient properties of any such representation are:

* 

the number of bytes used for storing the integer,

* 

whether the contents are interpreted as a signed or unsigned number,

* 

the byte ordering: whether the first byte is the least or most significant byte (or: little-endian or big-endian, respectively).

So, for instance, to pack 20302 to a signed 16 bit integer in your computer's representation you write

It looks like you are rendering \n's for the start and end of block elements, even if you haven't already written anything to the line buffer. So the leading * gets places on a different line.

This drastically impacts the readability of Duckling

LukeEmmet commented 2 years ago

It's the extra

within each

  • that I think it's causing this for this particular page.

    Probably the parser/renderer could be improved for this pattern

    On 29 Jun 2022, at 23:40, Acidus @.***> wrote:

    I noticed something in how Duckling renders lists which are causing the contents on a

  • to be on a separate line in the gem text output than the * .

    This is a page that Duckling rendered well (once you get past the initial nav stuff) https://perldoc.perl.org/perlpacktut

    Look at the "Integers" section: Here is the HTML

    Integers

    Packing and unpacking numbers implies conversion to and from some specific binary representation. Leaving floating point numbers aside for the moment, the salient properties of any such representation are:

    • the number of bytes used for storing the integer,

    • whether the contents are interpreted as a signed or unsigned number,

    • the byte ordering: whether the first byte is the least or most significant byte (or: little-endian or big-endian, respectively).

    So, for instance, to pack 20302 to a signed 16 bit integer in your computer's representation you write

    Here is how Duckling renders that:

    But Look at the Gemtext:

    Integers

    Packing and unpacking numbers implies conversion to and from some specific binary representation. Leaving floating point numbers aside for the moment, the salient properties of any such representation are:

    the number of bytes used for storing the integer,

    whether the contents are interpreted as a signed or unsigned number,

    the byte ordering: whether the first byte is the least or most significant byte (or: little-endian or big-endian, respectively).

    So, for instance, to pack 20302 to a signed 16 bit integer in your computer's representation you write It looks like you are rendering \n's for the start and end of block elements, even if you haven't already written anything to the line buffer. So the leading * gets places on a different line.

    This drastically impacts the readability of Duckling

    — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

  • acidus99 commented 2 years ago

    I encountered a similar issue when building the HTML to gemtext logic of Gemipedia. Instead of just outputting the raw gemtext, or using a simple buffer, I built a special buffer class, which would keep track whether I was already at the start of a new line or not, and would add a new line accordingly. This logic was aware of line prefixes like * for a list item, so it wouldn't append in a\n and break the list item line.

    Even if you don't know C#, this code may help: https://github.com/acidus99/Gemipedia/blob/main/Gemipedia/Converter/HtmlParser.cs#L378 https://github.com/acidus99/Gemipedia/blob/main/Gemipedia/Converter/HtmlParser.cs#L200 https://github.com/acidus99/Gemipedia/blob/main/Gemipedia/Converter/Buffer.cs

     case "p":
        buffer.EnsureAtLineStart();
        int size = buffer.Content.Length;
        ParseChildern(element);
        //make sure that after the paragraph ends, we are starting on new line
        buffer.EnsureAtLineStart();
        if (buffer.Content.Length > size)
        {
            //add another blank line if this paragraph had content
            buffer.AppendLine();
        }
        break;
    LukeEmmet commented 2 years ago

    Thanks for that - I was thinking along similar lines. Simplifying HTML is always a challenge. For example with nested Divs that don't need a new line, but a bullet inside a div should probably result in a new gemini line.

    LukeEmmet commented 2 years ago

    Also I should remark that the actual parsing and rendering behaviour is determined by the underlying library html2gemini - not Duckling specifically.