dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Deploy ReqMgr #497

Closed ghost closed 12 years ago

ghost commented 14 years ago

Deploy ReqMgr application, initially just to dev instance. Ensure app developers can use it and work with it. Requires all the standard bits -- contact, accounts, ssh keys, packaging, deploy and manage scripts, monitoring, etc. Also requires update to WMCore python security module, ticket #472.

ghost commented 14 years ago

lat: Task carried over to the next milestone as not completed in HG1010 period.

ghost commented 14 years ago

lat: Initial round of code audit comments attached. In general the code is very readable and comprehensible, so kudos for that. There are a number of concerns that will need to be addressed before we can put this on a public web server. It's ok to continue addressing those issues on the dev instance.

DMWMBot commented 14 years ago

rickw: Hi, Lassi,

I'm not one to abuse the word "incredibly", but that was incredibly useful. Thanks. I've inlined some comments, prefixed with '+' where I agree completely, and '-' where I'm being argumentative.

Lots of work to do now.

DMWMBot commented 13 years ago

rickw: Please Review

sfoulkes commented 13 years ago

sfoulkes: HTML cleanup patch applied.

DMWMBot commented 13 years ago

rickw: Please Review

DMWMBot commented 13 years ago

rickw: I left one REST call in. It was the one that updated the request's status and priority, and it did too many complicated things, like editing the Workload, calling WorkQueue abort, and such. Maybe it can all get factored out later.

The official song for this patch can be found here: http://www.youtube.com/watch?v=HKtsdZs9LJo

ghost commented 13 years ago

lat: Nice improvement, looks good to me. However you should use cherrypy.engine.subscribe('start_thread', func) to run per-thread initialisation, rather than doing it on every http request. See ThreadManager in http://www.cherrypy.org/wiki/BuiltinPlugins for more info, and/or search for "cherrypy start_thread" in your favourite search engine.

ghost commented 13 years ago

lat: Oh and the official song is blocked content over here :-)

ticoann commented 13 years ago

sryu: Hi Lassi, Sorry I haven't follow the ticket before. I am responsible for some bad code contribution. I added some questions below.

In WMCore/HTTPFrontEnd/RequestManager/RequestOverview.py 254 * VERY BAD: This module appears to allow arbitrary unchecked local file access. 255 * VERY BAD: This module appears to allow arbitrary listing of local file system.

Could you give me some examples to above case? I thought it shouldn't do that, but probably due to my lack of knowledge.

256 * Where are the HTML files this refers to? What are they? the config.html file should set to -> installation_path/WMCore/html/ which gets from the config.
as well as config.javascript, config.css.

Initially I thought RequestManager will deployed with WMCORE that is why I put the code there. should I move html, javascript code somewhere else?

In WMCore/HTTPFrontEnd/RequestManager/ExternalMethods/ResourceMonitor.py,

281 In WMCore/HTTPFrontEnd/RequestManager/ExternalMethods/Overview.py 282 * Appears to add dependency on WorkQueue? Entire WMBS?

It is dependent with /WMCore/Services/WMBS, /WMCore/Services/WorkQueue which is wrapper class for REST call - handles caching and encoding and decoding. Again initially I thought it will deployed with WMCore. I can change to pure REST call if /WMCore/Services/ is not deployed together.

Thank you, Seangchan

ghost commented 13 years ago

lat: Concerning unrestricted file access, you use os.path.join without verifying args. If any args element is an absolute path (e.g. %2fetc -> /etc), you grant access to anywhere in the file system. Similarly ".." in *args can be used to climb out of the starting directory. You must restrict all path components to strictly minimum required paths. For example if you know you do not allow sub-directories, then you should only allow file names. You should validate file names with regular expressions so it passes only the exact set of files you want to serve. So for example if you know you serve only static html and png files, you might validate the file name against regexp "^[a-z]+.(html|png)$".

Regarding files themselves, I didn't know where they were, so I wasn't able to review them. Now that I found them, I have some additional comments which I'll post shortly.

Concerning WMBS and WorkQueue dependencies, I'll let Dave Evans say the last word. My understanding is that WMBS will not be installed on web server nodes. Originally we were to deploy ReqMgr before WorkQueue, so it's a bit odd the former would pull in the latter :-)

ghost commented 13 years ago

lat: Now that I found the HTML files (WMCore/trunk/src/html/RequestManager), further comments:

DMWMBot commented 13 years ago

rickw: Replying to [comment:35 lat]:

Errr... I've not seen patches to COMP/SITECONF/T1_CH_CERN/ReqMgr.

Alll these patches are going to get tagged tomorrow, along with the rest of WMCore. Once that's done, I'll test the distribution and submit a patch to ReqMgrConfig. I'll try exercising the CERN Oracle and Couch again this afternoon.

The big thing left on the to-do list is the authentication. FNAL is getting an Apache installed as a testbed for this, mimicking what Valentin had to do to test his stuff.
I'm not sure the timescale for this, or whether it's a showstopper for pre-deployment.

drsm79 commented 13 years ago

metson: Replying to [comment:36 lat]:

Yes if it can be used to create an injection attack. Having an error message is ok. Having any client-supplied data is not ok, because you don't know what it is. Say you have a method "def foo(bar, **kwargs)". The error can say "parameter 'bar' has invalid value, expected an integer". You cannot:

  • Reflect back the value of parameter bar.
  • Reflect back any unknown keyword argument names from kwargs, since you don't know the keys are (or values).

Note that you should ensure the response goes back with appropriate HTTP status code, != 200, so the caller knows the call didn't succeed and will not attempt to reinterpret the error message as JSON for example.

FWIW the patch I'm working on for #717 and #718 changes the core to be a lot more terse when reporting exceptions. You get the exception type and that's about it, unless the application code raises an HTTPError in which case the framework assumes you know what you are doing. It sets the status code appropriately. I'll hopefully put this patch up for review tomorrow (I want to get the log handler done, in line with the DQM GUI style Lassi pointed me to).

Replying to [comment:35 lat]:

It's not about keeping secrets. We have negative experience with servers picking up unmaintained and/or incorrect defaults from RPMs, usually very much unintentionally. So, you can ship a configuration file in the RPM provided the server will not make any attempt what so ever to load it by default. Since the WMCore server by default attempts to load file called "DefaultConfig.py", you may not have any files with that name in the RPM.

It no longer does this, the change went in in 636ac072aabc1244220179c2869d1dda3259a3b2 a few weeks back. However, it looks like that's not been tagged, I'll follow up with Steve on that. There's also #579 which will drop the DefaultConfig.py from the build when I can wrangle distutils to ignore the file.

stuartw commented 13 years ago

swakef: Replying to [comment:36 lat]:

Having any client-supplied data is not ok, because you don't know what it is. Say you >have a method "def foo(bar, **kwargs)". The error can say "parameter 'bar' has i>nvalid value, expected an integer". You cannot:

  • Reflect back the value of parameter bar.
  • Reflect back any unknown keyword argument names from kwargs, since you don't know > the keys are (or values).

Apologies for hijacking this thread but this is relevant for the WorkQueue migration which will happen later. And I think its fair to say that these requirements are not exactly clear to a lot of us.

Why can't you return the invalid value after properly escaping (or is it quoting?) it such that it isn't interpreted by the client as http or javascript? Our web pages are full of user provided content so I don't see how this is different?

Lassi, do you know of any online documents about this kind of thing we should read to avoid each of us making the same mistakes?

ghost commented 13 years ago

lat: Simon, thanks for the updates on WMCore and builds. These sound promising. Since DefaultConfig.py will be truly ignored (by server and by build), I'd suggest to rename any configuration files in SVN to something else, perhaps ExampleConfig.py.

ghost commented 13 years ago

lat: Replying to [comment:40 swakef]:

Why can't you return the invalid value after properly escaping (or is it quoting?) it such that it isn't interpreted by the client as http or javascript? Our web pages are full of user provided content so I don't see how this is different?

Proper escaping for error messages of course helps. There are however many reasons not to do so. I didn't include all of them originally. I list some more below, but even this isn't an exhaustive list, just to give you an idea.

Just to be clear, I didn't make up any of the examples. They are not hypothetical, all of them are based on actual compromise reports I've seen at one point or another. I can muster up references for at least some of them if that's important.

Lassi, do you know of any online documents about this kind of thing we should read to avoid each of us making the same mistakes?

https://twiki.cern.ch/twiki/bin/view/CMS/DMWTServiceLevelAgreement#Code_audit has references you can start with. CERN security links have changed but start from https://security.web.cern.ch/security/recommendations/en/index.shtml and go from there. The SANS Top 25 Most Dangerous Software Errors (which goes to http://cwe.mitre.org/top25/ after you go through the top level documentation (please read that too though!)) has a lot of very concrete help and detail, examples etc. "The Six Dumbest Ideas in Computer Security" is very quick read, memorable, and I highly recommend it.

Also CERN runs courses for security programming. I think there's a separate course for web development.

BTW, in case you were wondering, the issue discussed here is listed on the top 25 most dangerous errors, it's CWE-209, "Information Exposure Through an Error Message".

ghost commented 13 years ago

lat: Let's say if you start with SANS Top25 list and when ever you provide code for deployment, you are able to say you've done real, honest attempt to eliminate all those problems, I'd be very very happy. Apart for a few random bugs, I think all my code audit reports were really about those top 25 issues...

ticoann commented 13 years ago

sryu: Replying to [comment:24 lat]:

Concerning four-level deep javascript namespace, and the response: "It was for organizing the code and prevention for possible name collision in case the code is reused for constructing different pages". Basically this comes down to premature generalisation. I very strongly recommend sticking to minimal design that delivers the current set of agreed functionality (whatever that is in your case).

The reason for this is that many developers (been there myself) are extremely prone to the "world domination" plans. They start building the product, and have ideas about all the places where the code will be used - outside actual planned milestones. The initial design reflects this world domination plan, rather than what was actually asked to do. The devs have this "... and after this version we'll take over the world, this will replace all applications X on the market in next 6 months" mindset when building the program.

But in reality, the world domination is never achieved, and someone else ends up maintaining ridiculously over-engineered product. Decide what you plan to ship, and deliver that - and pretty much exactly that. You'll make life easier for everyone.

Yes I think you are right. I can reduce to 2 level, just organizing within the webtools. Thanks

ticoann commented 13 years ago

sryu: Replying to [comment:25 lat]:

Regarding links, I meant anywhere where you generate "" in HTML. Overview.js is a good example (I didn't re-read all the code to remember where else I saw them, if anywhere).

Concerning "The links are not request manager only, they are from outside source (i.e localqueue). I will add the comments what the links will be. Or do you recommending somewhat different things?"

Yes I would like to see links restricted only to the expected valid targets. Ability to insert free links tends to be open to cross-site and forgery attacks, so the links should be validated against some short list of valid values, possibly using regexps.

More importantly it should be obvious from code generating whether it's generating an internal link, e.g. to some RequestManager API/web page call, or if it's generating an external link.

It is exernal links in a sense it is not from the same host. But all the link are from the WMCore/WorkQueue APIs. The links are not random but is not previously known either. The links gets from WorkQueue APIs and WMBS APIs which propagates it host link (in some cases adding suffix). I wander whether there still needs to be some verification in javascript end.

ticoann commented 13 years ago

sryu: Replying to [comment:31 lat]:

I am very glad to see the new validation of input arguments. But I am slightly confused. There seems to be two different schemes used, one in WMCore/HTTPFrontEnd/RequestManager/ReqMgrBrowser.py (in method validation of some but not all parameteres) and another WMCore/HTTPFrontEnd/RequestManager/ReqMgrRESTModel.py (central validation routines; 'validation' not specified for all methods however).

I'd recommend picking one or the other, but in any case please make sure all arguments do actually have validation.

Validation of CMSSW versions is very weak. I recommend using a regexp anchored to validate the entire CMSSW version string, such as {{{^CMSSW(\d+){3}([a-z\d]+)?$}}}.

Hi Lassi, could you review the code on validation. I am not sure what is the best practice. I didn't use regex since specific directory contains only specific type of files. https://svnweb.cern.ch/trac/CMSDMWM/attachment/ticket/808/patch-series-sryu.patch Thanks

Validating URLs with just checking "http" prefix is very weak. I suggest restricting the URL considerably more.

DMWMBot commented 13 years ago

rickw: Please Review

DMWMBot commented 13 years ago

rickw: I'm using the following regular expressions for validation: self.identifier = re.compile('^([a-zA-Z0-9\s.-_]){1,100}$')

three slashes each followed by an identifier

    self.dataset = re.compile('^(/([a-zA-Z0-9.-_]){1,100}){3}$')
    # CMSSW version
    self.cmsswVersion =  re.compile('^CMSSW(_\d+){3}(_[a-z\d]+)?$')
    # valid couch URL
    self.couchUrl = re.compile('^http://(([a-zA-Z0-9.-_]){1,100})(fnal.gov|cern.ch):5984')

The setup.py changes have been submitted to #579

stuartw commented 13 years ago

swakef: I would suggest the validation regular expressions go into src/python/WMCore/Lexicon.py (or maybe elsewhere if they need to be called from javascript) so we all agree on what the valid forms are.

ps. Does the cmssw expression allow the for X_X_X_patch10? Not sure we have ever gone that high but... Actually, it does but it also allows 'CMSSW_3_10_10_pre111pre111'.

fyi: phedex has a restriction on site name of i think 10 characters which we may or may not want to duplicate.

The couch restriction doesn't allow user@host format - but maybe that's intentional?

drsm79 commented 13 years ago

metson: +1 on using Lexicon, if it's appropriate. I think user@host isn't desirable - localhost should be ~insecure (but heavily firewalled), remote will be certificate authenticated.

sfoulkes commented 13 years ago

sfoulkes: Our development couch instances aren't going to be certificate authenticated.

drsm79 commented 13 years ago

metson: Why not? Once the Deploy/manage stuff is up and running it'll be easier to use that than to not, and that will pull in all the cert junk (I hope). Similarly, if it's development instance it shouldn't be public, so doesn't need user accounts.

sfoulkes commented 13 years ago

sfoulkes: FNAL is requiring us to protect the couch server with a user account. So I have the option of either firewalling it so it can only be accessed from the machine it's running, or having user accounts...