Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.79k stars 273 forks source link

AttributeError #273

Closed KrisTomura closed 5 years ago

KrisTomura commented 5 years ago

def getUrl(): url = str(input("Enter Url: "))

html = getUrl() textMaker = html2text.html2text(html) text = textMaker.handle(html)

print(text)

- Python version `python --version`
    - Python 3.7.3
- Error

Traceback (most recent call last): File "main.py", line 7, in textMaker = html2text.html2text(html) File "/usr/lib/python3.7/site-packages/html2text/init.py", line 937, in html2text return h.handle(html) File "/usr/lib/python3.7/site-packages/html2text/init.py", line 149, in handle self.feed(data) File "/usr/lib/python3.7/site-packages/html2text/init.py", line 145, in feed data = data.replace("</' + 'script>", "") AttributeError: 'NoneType' object has no attribute 'replace'

hsmett commented 5 years ago

Hi @KrisTomura

First, your getUrl function doesn't return anything. It should return the provided url.

Then you are trying to convert the input url directly. I'm pretty sure you would prefer to fetch the page content first :)

And finally, html2text.html2text(html) returns the text directly. Here is your working script:

import urllib.request
import html2text

def get_url():
    return str(input("Enter Url: "))

def get_content(url):
    return urllib.request.urlopen(url).read().decode("utf-8", "ignore")

url = get_url()
html = get_content(url)
text = html2text.html2text(html)

print(text)