cedricbonhomme / newspipe

A web news aggregator.
https://www.newspipe.org
GNU Affero General Public License v3.0
425 stars 40 forks source link

Using lxml parser for BeautifulSoup #10

Closed cedricbonhomme closed 8 years ago

cedricbonhomme commented 8 years ago

Looking at BeautifulSoup documentation (http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser), they recommend lxml parser. And because lxml is already a dependency of pyaggr3g470r there's no reason to not used it.

Exemple in crawler.py, replace

#!python

description = BeautifulSoup(description, "html.parser").decode()

by

#!python

description = BeautifulSoup(description, "lxml").decode()

cedricbonhomme commented 8 years ago

Using lxml parser instead of html.parser, fixes #4.

→ <<cset 67952f5b3358>>


Original comment by: Cédric Bonhomme