JohannesKaufmann / html-to-markdown

⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
https://html-to-markdown.com
MIT License
2.25k stars 118 forks source link

Spacing & numbering issues with nested lists #38

Closed dmorrison-olive closed 2 months ago

dmorrison-olive commented 3 years ago

Describe the bug

I see a couple issues with nested lists.

One issue is that there are extra line breaks between list items in nested lists. When I render this in my application, it wraps text with a <p> if there's an extra line break (which has implications for margin/padding).

Another (small) issue I see is that numbering gets off for numbered lists. I realize this doesn't matter with Markdown, but I thought I'd note it.

HTML Input

<p>
  The Corinthos Center for Cancer will be partially closed for remodeling
  starting <strong>4/15/21</strong>. Patients should be redirected as space
  permits in the following order:
</p>
<ol>
  <li>Metro Court West.</li>
  <li>Richie General.</li>
  <ol>
    <li>This place is ok.</li>
    <li>Watch out for the doctors.</li>
    <ol>
      <li>They bite.</li>
      <li>But not hard.</li>
    </ol>
  </ol>
  <li>Port Charles Main.</li>
</ol>
<p>For further information about appointment changes, contact:</p>
<ul>
  <li>Dorothy Hardy</li>
  <ul>
    <li><em>Head of Operations</em></li>
    <ul>
      <li><em>Interim</em></li>
    </ul>
  </ul>
  <li>dorothy.hardy@generalhospital.org</li>
  <li>555-555-5555</li>
</ul>
<p>
  <em>The remodel is </em
  ><a href="http://www.google.com/" target="_self"><em>expected</em></a
  ><em> to complete in June 2021.</em>
  <strong><em>Timeframe subject to change</em></strong
  ><em>.</em>
</p>

Generated Markdown

The Corinthos Center for Cancer will be partially closed for remodeling
starting **4/15/21**. Patients should be redirected as space
permits in the following order:

1. Metro Court West.
2. Richie General.

   1. This place is ok.
   2. Watch out for the doctors.
      1. They bite.
      2. But not hard.

4. Port Charles Main.

For further information about appointment changes, contact:

- Dorothy Hardy

  - _Head of Operations_
    - _Interim_

- dorothy.hardy@generalhospital.org
- 555-555-5555

_The remodel is_ [_expected_](http://www.google.com/) _to complete in June 2021._ **_Timeframe subject to change_** _._

Note how there are extra line breaks after "2. Richie General.", " 2. But not hard.", "- Dorothy Hardy", and " - Interim".

Also note how "4. Port Charles Main." should be "3. Port Charles Main.".

Expected Markdown

The Corinthos Center for Cancer will be partially closed for remodeling
starting **4/15/21**. Patients should be redirected as space
permits in the following order:

1. Metro Court West.
2. Richie General.
   1. This place is ok.
   2. Watch out for the doctors.
      1. They bite.
      2. But not hard.
3. Port Charles Main.

For further information about appointment changes, contact:

- Dorothy Hardy
  - _Head of Operations_
    - _Interim_
- dorothy.hardy@generalhospital.org
- 555-555-5555

_The remodel is_ [_expected_](http://www.google.com/) _to complete in June 2021._ **_Timeframe subject to change_** _._

Additional context

I see this with the latest version (1.3.0). I'm using no plugins.

Thanks for the utility!

JohannesKaufmann commented 3 years ago

@dmorrison-olive thanks for reporting this 🙏

extra line breaks

Sorry that it takes so long. I'm working on it but haven't found a good solution... It's really tricky to fix 🤯

numbering gets off

Unfortunately, that's not that easy to fix, since I'm already allocating the numbers before it's detecting that an item is empty and can be skipped. Since it's just visuals, I won't fix it right now.

dmorrison-olive commented 3 years ago

Thanks! And that makes sense about the numbering, since it doesn't matter and is technically still valid Markdown. 👍

JohannesKaufmann commented 2 months ago

On the "v2" branch are a lot of improvements — including a fix to this bug.

It is still experimental but feel free to give it a try. Happy to hear about your experience 😊

I am going to close this issue. If you find anything with the new version, please open a new issue!