Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.74k stars 858 forks source link

Not handling nested lists #1378

Closed Knamdev closed 1 year ago

Knamdev commented 1 year ago

input_str = """ We would like to study about states of matter\n

Matter\n

out_put = markdown.markdown(input_str) print(out_put)

The input string has nested lists, but when I convert it to html using markdown, the nested lists are displayed as single list.

We would like to study about states of matter

Matter

facelessuser commented 1 year ago

You are not indenting nested list to 4 spaces like the documentation states is required:

import markdown

input_str = """
We would like to study about states of matter\n
## Matter

- Solid
    - Ice cream
    - mobile
    - board etc.
- Liquid
    - Water
- Gas
    - Air

"""

out_put = markdown.markdown(input_str)
print(out_put)

"""
<p>We would like to study about states of matter</p>
<h2>Matter</h2>
<ul>
<li>Solid<ul>
<li>Ice cream</li>
<li>mobile</li>
<li>board etc.</li>
</ul>
</li>
<li>Liquid<ul>
<li>Water</li>
</ul>
</li>
<li>Gas<ul>
<li>Air</li>
</ul>
</li>
</ul>
"""

We would like to study about states of matter

Matter

waylan commented 1 year ago

As @facelessuser states, this behavior is documented. You can find that here. As this is the intended behavior, I am closing this.

chrispy-snps commented 4 days ago

Unfortunately, some Markdown sources do use three spaces for indented lists. For example, LLM-generated replies often use three spaces (likely because their training material included such content).

While I understand the four-space expectation is documented, it would be nice to have an option to relax the requirement for less well-behaved data sources. If anyone is aware of such a plugin, please share!

facelessuser commented 4 days ago

It should be noted that Python Markdown is not meant to parse every Markdown convention out in the wild. Python Markdown is also an old-school Markdown parser, not a CommonMark parser. With that said, extensions can be created to override the default list behavior, and IIRC there are probably some already out there that are more forgiving with indentation. I don't have time to hunt examples right now though, and I also don't know if they have other side effects, but I'm fairly certain there are some extensions out there that attempt to address the indentation to be more forgiving.

mitya57 commented 4 days ago

Unfortunately, some Markdown sources do use three spaces for indented lists. For example, LLM-generated replies often use three spaces

You can configure the number of spaces which is considered an indentation using the tab_length setting:

>>> text = """
... * foo
...    - bar
... """
>>> print(markdown.markdown(text))
<ul>
<li>foo</li>
<li>bar</li>
</ul>
>>> print(markdown.markdown(text, tab_length=3))
<ul>
<li>foo<ul>
<li>bar</li>
</ul>
</li>
</ul>
chrispy-snps commented 2 days ago

@mitya57 - the tab_length setting works perfectly - thank you!

In our application, the LLM (OpenAI GPT-4o) generates responses that align list sublist item to the parent item's start-of-content:

9. This is item 9.
   - This is a sublist item.
10. This is item 10.
    - This is a sublist item.

Setting tab_length=3 handles both cases perfectly.