adobe / helix-html2md

Service to convert Helix Generic HTML Content to Markdown.
Apache License 2.0
11 stars 12 forks source link

Bullet point in list with multiple images not round trippable #555

Open janaki-r-bhagwath opened 1 week ago

janaki-r-bhagwath commented 1 week ago

Description If a single bullet point in a list has 2 images - it gets converted to 2 bullet points

To Reproduce Source HTML

<ul>
    <li>
        <picture>
           ...
        </picture>
        <picture>
            ...
        </picture>
    </li>
    <li>
        <picture>
            ...
        </picture>
        <picture>
            ...
        </picture>
    </li>
</ul>

Expected MD

- ![][image0]![][image1]
- ![][image2]![][image3]

Generated MD

- ![][image0]

  ![][image1]
- ![][image2]

  ![][image3]

The new line feeds in this md get converted to bullet points when md2docx is used.

Expected behavior The generated md should be

- ![][image0]![][image1]
- ![][image2]![][image3]

Screenshots

Version: run: $ hlx --version

Additional context Add any other context about the problem here.

buuhuu commented 1 week ago

~This seems to be the same as #529~ It is not related. The list/listItem is not spread in this case.

The additional <p> is added by https://github.com/syntax-tree/hast-util-to-mdast/blob/main/lib/util/wrap.js#L51-L67 independently of the number of <picture> tags.

I added a test case here: https://github.com/adobe/helix-html2md/commit/25b19acb2a90ae818c72b1ef846cf2278dc15343

tripodsan commented 1 week ago

The additional <p> is added

but that's ok. maybe should just wrap the <img><img> with a <p>, before hast-to-mdast:


<li>
  <p><img><img></p>
</li>
buuhuu commented 1 week ago

It is ok, and yes we could do that. However, the behaviour is different to doc-based. There the <picture>s are not wrapped in <p>s.

Maybe we should not try to have the exact same output generated for html2md compared to docx2md & co.? That is the same discussion as in #529