dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

ZMQ based alert framework - lxml dependency #1825

Closed zdenekmaxa closed 12 years ago

zdenekmaxa commented 13 years ago

Could you please add lxml library external dependency? Necessary for alerts framework and possibly useful for others too. http://lxml.de

geneguvo commented 13 years ago

diego: Hello,

Could you please confirm there is a real dependency of zmq on lxml? By looking at zmq website I couldn't find lxml as a needed or even optional dependency.

Diego.

zdenekmaxa commented 13 years ago

maxa: To clarify the Alerts framework dependencies overall:

Let me know please should you still find anything unclear.

drsm79 commented 13 years ago

metson: Hi, I'm rather concerned by the number of additional externals the alerts framework has added. Maybe I've over counted, but it seems like a lot. Could you provide on this ticket a list of all the externals you would like to add, ideally with some justification as to why they are needed? Cheers Simon

evansde77 commented 13 years ago

evansde: ZMQ - The underlying message passing architecture that allows us to ding alert messages around and propagate them out. psutil - General python library from google for monitoring processes. Useful to watchdog the agent and will allow us to standardise some of the other areas like runtime monitoring for time & memory watchdogs. lxml - Used to generate the RSS feeds for alerts that ops requested.

First two are no-brainers for me, having written several message passing systems and processing watchdogs over the years, they do a better job that rolling our own. You could probably argue about writing and maintaining our own RSS formatter though, but I have also written a bunch of xml serialisers and processors and we have spent quite a bit of time rooting out stuff like IMProv in favour of more standard python libraries. I can quite understand the desire to use an external library that is supported/maintained well etc when it clearly meets needs that we have. I think if you want to argue against them you (and Stu, since he threw his 2p in) need to provide better arguments against it than "we have lots of externals" including viable alternatives.

drsm79 commented 13 years ago

metson: A quick google found http://www.dalkescientific.com/Python/PyRSS2Gen.html which might be better for making RSS than lxml (which adds dependencies on libxml2 and libxslt, though it looks like these are already being built). Agree on 0MQ and psutil, and (as per chat) glad twitter/oauth is gone :)

evansde77 commented 13 years ago

evansde: Because Simon and I have no lives, we were talking about this out of hours and Metson had a bit of an inspired idea for the RSS thing that may spare us the lxml dependency:

We already have a logger handler that writes to couch, so we could log each article as a couch document and then use a view or show to format the last N items as an RSS feed. There is apparently an RSS feed function in the Couch Book....

I guess there is more going on under that hair than people realise ;)

drsm79 commented 13 years ago

metson: Replying to [comment:6 evansde]:

We already have a logger handler that writes to couch, so we could log each article as a couch document and then use a view or show to format the last N items as an RSS feed. There is apparently an RSS feed function in the Couch Book....

You'd use a list function to make the RSS/ATOM feed:

The nice thing about this is that you can also pull out stats from the alerts with other views in the couchapp. The CouchDB log handler is in https://svnweb.cern.ch/trac/CMSDMWM/browser/WMCore/trunk/src/python/WMCore/WMLogging.py

zdenekmaxa commented 13 years ago

maxa: Simon,

I knew of !PyRSS2Gen generator/formatter. The trouble is that in the !RSSFeedSink it's necessary not only to write out XML (which as such would have been possible via easier means than any special formatter whatsoever - be it simple python string interpolation or via already available Cheetah) but also to read/parse already existing RSS document and append to it (depending on configurable depth of the RSS document). There is a great tool feedparser.org for reading in.

Rather than introducing two extra dependencies for RSS and knowing (from TFC-related discussions) there is no proper XML handling library available, I implemented RSS by means of lxml, which dependency can IMO be generally useful. And Valentin already expressed so.

A brief summary of this aforementioned reasoning was already present on #1712 which I am re-opening for RSS implementation via couch.

stuartw commented 13 years ago

swakef: Replying to [comment:2 maxa]:

To clarify the Alerts framework dependencies overall:

  • ZMQ (if it depends on something further, then only things listed on the website)

fyi, for the record I had to add zeromq python bindings as they don't come as part of the zeromq install.

drsm79 commented 13 years ago

metson: Replying to [comment:8 maxa]:

I knew of !PyRSS2Gen generator/formatter. The trouble is that in the !RSSFeedSink it's necessary not only to write out XML (which as such would have been possible via easier means than any special formatter whatsoever - be it simple python string interpolation or via already available Cheetah)

Yeah, we've used Cheetah in the past, and it seemed to work but be fragile.

but also to read/parse already existing RSS document and append to it (depending on configurable depth of the RSS document). There is a great tool feedparser.org for reading in.

It's the reading in that I find a bit odd (hence the idea of using a couch logger/list). Potentially that's a very long list of alerts, with couch you can reduce it to a known size by passing in a {{{?limit=20}}} or similar.

Rather than introducing two extra dependencies for RSS and knowing (from TFC-related discussions) there is no proper XML handling library available, I implemented RSS by means of lxml, which dependency can IMO be generally useful. And Valentin already expressed so.

I think that you're right in that we need a decent way of handling XML (in a similar manner to the JSONWrapper stuff), and I have nothing against lxml. I'm not sure that it's appropriate for this specific case (the size issue above). I've also found that in a lot of cases a ten line SAX parser is quicker and more lightweight than some DOM library (I had to do something similar parsing huge xml files from PhEDEx, and the easiest and fastest thing was ~10 line of dirty SAX). lxml sounds like it's doing a DOM-like thing (though I've only ready briefly). Anyhow, these sort of issues make a generic XML handler a lot harder to set up than something like JSONWrapper. Idea's on that (and probably another ticket?) would be good...

A brief summary of this aforementioned reasoning was already present on #1712 which I am re-opening for RSS implementation via couch.

Cool, thanks!

Replying to [comment:9 swakef]:

Replying to [comment:2 maxa]:

  • ZMQ (if it depends on something further, then only things listed on the website)

fyi, for the record I had to add zeromq python bindings as they don't come as part of the zeromq install.

OK, so there's another thing. Zdenek are you making the spec files or passing a complete list on to Diego? I know Diego is busy this week, so if you could provide a few specs (ideally as patches against [http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/COMP/CMSDIST/ CMSDIST]) that might speed things up.

zdenekmaxa commented 13 years ago

maxa: Hi Diego,

the complete list of dependencies once again:

Alerts fw will eventually not depend on lxml though I am not sure whether it should be included as dependency or not.

If there is still something to clarify, we can use IM, Im zdenekmaxa on AIM.

Thanks, Zdenek

zdenekmaxa commented 13 years ago

maxa: Replying to [comment:10 metson]:

Replying to [comment:8 maxa]:

I think that you're right in that we need a decent way of handling XML (in a similar manner to the JSONWrapper stuff), and I have nothing against lxml. I'm not sure that it's appropriate for this specific case (the size issue above). I've also found that in a lot of cases a ten line SAX parser is quicker and more lightweight than some DOM library (I had to do something similar parsing huge xml files from PhEDEx, and the easiest and fastest thing was ~10 line of dirty SAX). lxml sounds like it's doing a DOM-like thing (though I've only ready briefly). Anyhow, these sort of issues make a generic XML handler a lot harder to set up than something like JSONWrapper. Idea's on that (and probably another ticket?) would be good...

The guy has a lot about lxml performance on the website: http://lxml.de/performance.html

You may want to give lxml a chance on your huge !PhEDEx files: {{{ from lxml import objectify data = objectify.parse("a.xml") for c in data.getroot().iterchildren(): print c.tag ... }}}

drsm79 commented 13 years ago

metson: Like I said, the way we handle XML currently is ad hoc, and maybe lxml would improve things (though I think there's enough problems with XML as a format that a generic solution is tough...). I don't mind adding lxml as a dependency, I'm just cautious of a proliferation of externals. As I said above ideas on some general XML handling code would be appreciated.

vkuznet commented 13 years ago

valya: I'm not sure what do you mean "general XML handing code". In my mind it requires XML parser, standard, 3d party tool or custom. The problem is that existing XML parsers are not providing the same APIs. But similar to JSON/cjson, the lxml seems better alternative wrt standard python xml module and it has close API functionality, such that we can write custom XMLParser (similar to JSONParser) which can switch between standard xml and lxml (if later is presented).

stuartw commented 13 years ago

swakef: Did this reach any conclusion? I'll wait for Simon and Dave to greenlight these before trying to add them.

lxml will add at least 3 new dependencies: libxml2, libxslt, lxml.

Given we have replaced almost all our xml parsing with json I'm not sure i see much reward in changing our xml handling. The only caveat i can see to that is the framework job report which might be worth speeding up however if we have a sax parser already written for it I'm not sure there is much incentive to change.

evansde77 commented 13 years ago

evansde: I would say hold off for now Stu, soudns like we have a solution for the RSS stuff using couch for now, and we probably need to talk/think about what we want for XML processing based on this ticket

zdenekmaxa commented 13 years ago

maxa: * closing this. lxml dependency is not going to happen, feed format for alerts will be generated via couch.