Closed KrisTomura closed 5 years ago
Hi @KrisTomura
First, your getUrl function doesn't return anything. It should return
the provided url.
Then you are trying to convert the input url directly. I'm pretty sure you would prefer to fetch the page content first :)
And finally, html2text.html2text(html)
returns the text directly.
Here is your working script:
import urllib.request
import html2text
def get_url():
return str(input("Enter Url: "))
def get_content(url):
return urllib.request.urlopen(url).read().decode("utf-8", "ignore")
url = get_url()
html = get_content(url)
text = html2text.html2text(html)
print(text)
html2text --version
def getUrl(): url = str(input("Enter Url: "))
html = getUrl() textMaker = html2text.html2text(html) text = textMaker.handle(html)
print(text)
Traceback (most recent call last): File "main.py", line 7, in
textMaker = html2text.html2text(html)
File "/usr/lib/python3.7/site-packages/html2text/init.py", line 937, in html2text
return h.handle(html)
File "/usr/lib/python3.7/site-packages/html2text/init.py", line 149, in handle
self.feed(data)
File "/usr/lib/python3.7/site-packages/html2text/init.py", line 145, in feed
data = data.replace("</' + 'script>", "")
AttributeError: 'NoneType' object has no attribute 'replace'