Closed Knamdev closed 1 year ago
You are not indenting nested list to 4 spaces like the documentation states is required:
import markdown
input_str = """
We would like to study about states of matter\n
## Matter
- Solid
- Ice cream
- mobile
- board etc.
- Liquid
- Water
- Gas
- Air
"""
out_put = markdown.markdown(input_str)
print(out_put)
"""
<p>We would like to study about states of matter</p>
<h2>Matter</h2>
<ul>
<li>Solid<ul>
<li>Ice cream</li>
<li>mobile</li>
<li>board etc.</li>
</ul>
</li>
<li>Liquid<ul>
<li>Water</li>
</ul>
</li>
<li>Gas<ul>
<li>Air</li>
</ul>
</li>
</ul>
"""
We would like to study about states of matter
As @facelessuser states, this behavior is documented. You can find that here. As this is the intended behavior, I am closing this.
Unfortunately, some Markdown sources do use three spaces for indented lists. For example, LLM-generated replies often use three spaces (likely because their training material included such content).
While I understand the four-space expectation is documented, it would be nice to have an option to relax the requirement for less well-behaved data sources. If anyone is aware of such a plugin, please share!
It should be noted that Python Markdown is not meant to parse every Markdown convention out in the wild. Python Markdown is also an old-school Markdown parser, not a CommonMark parser. With that said, extensions can be created to override the default list behavior, and IIRC there are probably some already out there that are more forgiving with indentation. I don't have time to hunt examples right now though, and I also don't know if they have other side effects, but I'm fairly certain there are some extensions out there that attempt to address the indentation to be more forgiving.
Unfortunately, some Markdown sources do use three spaces for indented lists. For example, LLM-generated replies often use three spaces
You can configure the number of spaces which is considered an indentation using the tab_length setting:
>>> text = """
... * foo
... - bar
... """
>>> print(markdown.markdown(text))
<ul>
<li>foo</li>
<li>bar</li>
</ul>
>>> print(markdown.markdown(text, tab_length=3))
<ul>
<li>foo<ul>
<li>bar</li>
</ul>
</li>
</ul>
@mitya57 - the tab_length
setting works perfectly - thank you!
In our application, the LLM (OpenAI GPT-4o) generates responses that align list sublist item to the parent item's start-of-content:
9. This is item 9.
- This is a sublist item.
10. This is item 10.
- This is a sublist item.
Setting tab_length=3
handles both cases perfectly.
input_str = """ We would like to study about states of matter\n
Matter\n
Gas
"""
out_put = markdown.markdown(input_str) print(out_put)
The input string has nested lists, but when I convert it to html using markdown, the nested lists are displayed as single list.
We would like to study about states of matter
Matter