OpenELEC / OpenELEC.tv

OpenELEC - The living room PC for everyone
http://openelec.tv
1.61k stars 884 forks source link

python-lxml should be installed #2661

Closed mbr closed 8 years ago

mbr commented 10 years ago

At the moment, there are four popular parsers for XML/HTML:

html5lib doesn't seem to appear as widespread, and BeautifulSoup has a reputation of being somewhat slow. stdlib-ElementTree occasionally stumbles over badly formatted HTML. Also, lxml is way faster

For this reason lxml seems rather popular (5x that both BeautifulSoup and html5lib combined), having tried all solutions myself, I can say as well that I vastly prefer lxml.

HTML-parsing is a core functionality for most addons that scrape stuff from the web and display the results. Some will have very long run times mainly due to XML parsing. lxml's interface and feature-completeness is also unrivaled by its competitors.

For this reason, it'd be nice to have lxml available per default. So far I've found a patch that I'm trying out right now (without any results, compiling takes quite a while):

Some addons are (maybe carelessly?) already written using lxml, maybe I can try to find a list of some later on.

Still, lxml would be very nice to have.

Some anecdotal links:

MilhouseVH commented 10 years ago

Is lxml a drop in replacement for the other xml libraries, if not will existing addons be able to make use of any performance advantage? Unless lxml is in core XBMC, it will only ever be unique to OpenELEC, which most likely isn't desirable.

Hopefully it is a drop-in replacement, as any speed improvement would be most welcome, particularly on devices like the Pi where delays are really noticeable in add-ons like SportsDevil when compared with the same add-on running on x86 gear.

mbr commented 10 years ago

lxml can be used as a drop-in for ElementTree, but not for BeautifulSoup, afaik - though there is a module leveraging some parts of BeautifulSoup for really broken markup (?). I'm not sure which one is more widely in use in other addons.

mbr commented 10 years ago

As for getting it into core, well, it would be a start. Which modules are available does vary depending on the system - on my desktop, lxml was installed systemwide and visible from xbmc.

dgolda commented 10 years ago

+1 I'd like to port EPG grabber tv_grab_pl_epguide to OpenELEC. This grabber is written in python and uses lxml.

Lunatixz commented 10 years ago

+1 lxml is a drop in for etree, and would be a welcomed addition for performance reasons...

stefansaraev commented 10 years ago

I'll see what I can do next few days.

stefansaraev commented 10 years ago

test http://sprunge.us/iKhH

mrdominuzq commented 9 years ago

Not Even a thanks to @stefansaraev thats not nice ;)

mrdominuzq commented 9 years ago

what about this @stefansaraev

been here a long time :)

stefansaraev commented 9 years ago

keep open. but I have no time for this, really

mrdominuzq commented 9 years ago

it would be nice if any of those people who put a +1 on this would test and reply back :)

chewitt commented 8 years ago

I'm closing this ticket down due to lack of update. If this is still wanted as an addition to core builds one of you needs to submit a PR with the changes so it can be reviewed/merged. Thanks.