elliotgao2 / tomd

Convert HTML to Markdown.
GNU General Public License v3.0
531 stars 71 forks source link

<b> bold </b> only works inside <p> </p> #12

Open yucongo opened 6 years ago

yucongo commented 6 years ago
tomd.convert('<p><b> bold </b></p>')  # '\n** bold **\n',   works
tomd.convert('<b> bold </b>')  # "", does not work 

maybe pyquery can be useful, something like this:

from pyquery import Pyquery as pq
from tomd import MARKDOWN

html = "<b> bold </b>"
doc = pq(html)
for elm, val in MARKDOWN.items():
    # for item in doc(elm): replace item.html() with val[0] + pq(item).text() + val[1]
elliotgao2 commented 6 years ago

@yucongo The tage <b> is an inline tag which should be in a block tag. I wonder when using tag <b> outside a block tag like <p> or <div>.

yucongo commented 6 years ago

I worked it out using pyquery, for my need at least:

from pyquery import PyQuery as pq
from tomd import MARKDOWN

html = '''
<h1>h1</h1>
<h2>h2</h2><h3>h3</h3>
<h4>h4</h4>
<del>del</del>
<b>bold</b>
<i>italic</i>
<b><i>bold italic</i></b>'''

doc = pq(html)

for elm, val in MARKDOWN.items():
    for item in doc(elm).items():
        item.replace_with(val[0] + item.html() + val[1])
print(doc.text())

Output

# h1

## h2

### h3

#### h4

~~del~~
**bold**
*italic*
***bold italic***
elliotgao2 commented 6 years ago

@yucongo okay.