Closed mbr closed 8 years ago
Is lxml a drop in replacement for the other xml libraries, if not will existing addons be able to make use of any performance advantage? Unless lxml is in core XBMC, it will only ever be unique to OpenELEC, which most likely isn't desirable.
Hopefully it is a drop-in replacement, as any speed improvement would be most welcome, particularly on devices like the Pi where delays are really noticeable in add-ons like SportsDevil when compared with the same add-on running on x86 gear.
lxml can be used as a drop-in for ElementTree, but not for BeautifulSoup, afaik - though there is a module leveraging some parts of BeautifulSoup for really broken markup (?). I'm not sure which one is more widely in use in other addons.
As for getting it into core, well, it would be a start. Which modules are available does vary depending on the system - on my desktop, lxml was installed systemwide and visible from xbmc.
+1 I'd like to port EPG grabber tv_grab_pl_epguide to OpenELEC. This grabber is written in python and uses lxml.
+1 lxml is a drop in for etree, and would be a welcomed addition for performance reasons...
I'll see what I can do next few days.
Not Even a thanks to @stefansaraev thats not nice ;)
what about this @stefansaraev
been here a long time :)
keep open. but I have no time for this, really
it would be nice if any of those people who put a +1 on this would test and reply back :)
I'm closing this ticket down due to lack of update. If this is still wanted as an addition to core builds one of you needs to submit a PR with the changes so it can be reviewed/merged. Thanks.
At the moment, there are four popular parsers for XML/HTML:
html5lib doesn't seem to appear as widespread, and BeautifulSoup has a reputation of being somewhat slow. stdlib-ElementTree occasionally stumbles over badly formatted HTML. Also, lxml is way faster
For this reason lxml seems rather popular (5x that both BeautifulSoup and html5lib combined), having tried all solutions myself, I can say as well that I vastly prefer lxml.
HTML-parsing is a core functionality for most addons that scrape stuff from the web and display the results. Some will have very long run times mainly due to XML parsing. lxml's interface and feature-completeness is also unrivaled by its competitors.
For this reason, it'd be nice to have lxml available per default. So far I've found a patch that I'm trying out right now (without any results, compiling takes quite a while):
Some addons are (maybe carelessly?) already written using lxml, maybe I can try to find a list of some later on.
Still, lxml would be very nice to have.
Some anecdotal links: