Closed RedHotUnicorn closed 6 months ago
import urllib.request from inscriptis import get_text from inscriptis.model.config import ParserConfig
url = "https://t.me/zettelkasten_ch/549?embed=1&mode=tme" url = "https://habr.com/ru/companies/ncloudtech/articles/806771/" html = urllib.request.urlopen(url).read().decode('utf-8')
text = get_text(html,ParserConfig(display_links=True)) print(text)
switched to markdownify
also used morss readability to fetch readable content
import urllib.request from inscriptis import get_text from inscriptis.model.config import ParserConfig
url = "https://t.me/zettelkasten_ch/549?embed=1&mode=tme" url = "https://habr.com/ru/companies/ncloudtech/articles/806771/" html = urllib.request.urlopen(url).read().decode('utf-8')
text = get_text(html,ParserConfig(display_links=True)) print(text)