Closed ghost closed 12 years ago
lat: Task carried over to the next milestone as not completed in HG1010 period.
lat: Initial round of code audit comments attached. In general the code is very readable and comprehensible, so kudos for that. There are a number of concerns that will need to be addressed before we can put this on a public web server. It's ok to continue addressing those issues on the dev instance.
rickw: Hi, Lassi,
I'm not one to abuse the word "incredibly", but that was incredibly useful. Thanks. I've inlined some comments, prefixed with '+' where I agree completely, and '-' where I'm being argumentative.
Lots of work to do now.
rickw: Please Review
sfoulkes: HTML cleanup patch applied.
rickw: Please Review
rickw: I left one REST call in. It was the one that updated the request's status and priority, and it did too many complicated things, like editing the Workload, calling WorkQueue abort, and such. Maybe it can all get factored out later.
The official song for this patch can be found here: http://www.youtube.com/watch?v=HKtsdZs9LJo
lat: Nice improvement, looks good to me. However you should use cherrypy.engine.subscribe('start_thread', func) to run per-thread initialisation, rather than doing it on every http request. See ThreadManager in http://www.cherrypy.org/wiki/BuiltinPlugins for more info, and/or search for "cherrypy start_thread" in your favourite search engine.
lat: Oh and the official song is blocked content over here :-)
sryu: Hi Lassi, Sorry I haven't follow the ticket before. I am responsible for some bad code contribution. I added some questions below.
In WMCore/HTTPFrontEnd/RequestManager/RequestOverview.py 254 * VERY BAD: This module appears to allow arbitrary unchecked local file access. 255 * VERY BAD: This module appears to allow arbitrary listing of local file system.
Could you give me some examples to above case? I thought it shouldn't do that, but probably due to my lack of knowledge.
256 * Where are the HTML files this refers to? What are they?
the config.html file should set to -> installation_path/WMCore/html/ which gets from the config.
as well as config.javascript, config.css.
Initially I thought RequestManager will deployed with WMCORE that is why I put the code there. should I move html, javascript code somewhere else?
In WMCore/HTTPFrontEnd/RequestManager/ExternalMethods/ResourceMonitor.py,
281 In WMCore/HTTPFrontEnd/RequestManager/ExternalMethods/Overview.py 282 * Appears to add dependency on WorkQueue? Entire WMBS?
It is dependent with /WMCore/Services/WMBS, /WMCore/Services/WorkQueue which is wrapper class for REST call - handles caching and encoding and decoding. Again initially I thought it will deployed with WMCore. I can change to pure REST call if /WMCore/Services/ is not deployed together.
Thank you, Seangchan
lat: Concerning unrestricted file access, you use os.path.join without verifying args. If any args element is an absolute path (e.g. %2fetc -> /etc), you grant access to anywhere in the file system. Similarly ".." in *args can be used to climb out of the starting directory. You must restrict all path components to strictly minimum required paths. For example if you know you do not allow sub-directories, then you should only allow file names. You should validate file names with regular expressions so it passes only the exact set of files you want to serve. So for example if you know you serve only static html and png files, you might validate the file name against regexp "^[a-z]+.(html|png)$".
Regarding files themselves, I didn't know where they were, so I wasn't able to review them. Now that I found them, I have some additional comments which I'll post shortly.
Concerning WMBS and WorkQueue dependencies, I'll let Dave Evans say the last word. My understanding is that WMBS will not be installed on web server nodes. Originally we were to deploy ReqMgr before WorkQueue, so it's a bit odd the former would pull in the latter :-)
lat: Now that I found the HTML files (WMCore/trunk/src/html/RequestManager), further comments:
Then days and weeks later someone else comes and views the data I wrote up. Say, data ops, with super powers. They get displayed a page with the free text input. Only, they just submitted a data deletion request in their name. They won't even see the script, since the web browser interpreted it as HTML.
All HTML output needs to be sanitised, no matter where the logic is.
As an aside, googling "escaping cheetah" is interesting, but not very helpful.
See #545 for example of, ahem, extensive discussion on the same subject :-)
rickw: Replying to [comment:35 lat]:
Errr... I've not seen patches to COMP/SITECONF/T1_CH_CERN/ReqMgr.
Alll these patches are going to get tagged tomorrow, along with the rest of WMCore. Once that's done, I'll test the distribution and submit a patch to ReqMgrConfig. I'll try exercising the CERN Oracle and Couch again this afternoon.
The big thing left on the to-do list is the authentication.
FNAL is getting an Apache installed as a testbed for this,
mimicking what Valentin had to do to test his stuff.
I'm not sure the timescale
for this, or whether it's a showstopper for pre-deployment.
metson: Replying to [comment:36 lat]:
Yes if it can be used to create an injection attack. Having an error message is ok. Having any client-supplied data is not ok, because you don't know what it is. Say you have a method "def foo(bar, **kwargs)". The error can say "parameter 'bar' has invalid value, expected an integer". You cannot:
- Reflect back the value of parameter bar.
- Reflect back any unknown keyword argument names from kwargs, since you don't know the keys are (or values).
Note that you should ensure the response goes back with appropriate HTTP status code, != 200, so the caller knows the call didn't succeed and will not attempt to reinterpret the error message as JSON for example.
FWIW the patch I'm working on for #717 and #718 changes the core to be a lot more terse when reporting exceptions. You get the exception type and that's about it, unless the application code raises an HTTPError in which case the framework assumes you know what you are doing. It sets the status code appropriately. I'll hopefully put this patch up for review tomorrow (I want to get the log handler done, in line with the DQM GUI style Lassi pointed me to).
Replying to [comment:35 lat]:
It's not about keeping secrets. We have negative experience with servers picking up unmaintained and/or incorrect defaults from RPMs, usually very much unintentionally. So, you can ship a configuration file in the RPM provided the server will not make any attempt what so ever to load it by default. Since the WMCore server by default attempts to load file called "DefaultConfig.py", you may not have any files with that name in the RPM.
It no longer does this, the change went in in 636ac072aabc1244220179c2869d1dda3259a3b2 a few weeks back. However, it looks like that's not been tagged, I'll follow up with Steve on that. There's also #579 which will drop the DefaultConfig.py from the build when I can wrangle distutils to ignore the file.
swakef: Replying to [comment:36 lat]:
Having any client-supplied data is not ok, because you don't know what it is. Say you >have a method "def foo(bar, **kwargs)". The error can say "parameter 'bar' has i>nvalid value, expected an integer". You cannot:
- Reflect back the value of parameter bar.
- Reflect back any unknown keyword argument names from kwargs, since you don't know > the keys are (or values).
Apologies for hijacking this thread but this is relevant for the WorkQueue migration which will happen later. And I think its fair to say that these requirements are not exactly clear to a lot of us.
Why can't you return the invalid value after properly escaping (or is it quoting?) it such that it isn't interpreted by the client as http or javascript? Our web pages are full of user provided content so I don't see how this is different?
Lassi, do you know of any online documents about this kind of thing we should read to avoid each of us making the same mistakes?
lat: Simon, thanks for the updates on WMCore and builds. These sound promising. Since DefaultConfig.py will be truly ignored (by server and by build), I'd suggest to rename any configuration files in SVN to something else, perhaps ExampleConfig.py.
lat: Replying to [comment:40 swakef]:
Why can't you return the invalid value after properly escaping (or is it quoting?) it such that it isn't interpreted by the client as http or javascript? Our web pages are full of user provided content so I don't see how this is different?
Proper escaping for error messages of course helps. There are however many reasons not to do so. I didn't include all of them originally. I list some more below, but even this isn't an exhaustive list, just to give you an idea.
Just to be clear, I didn't make up any of the examples. They are not hypothetical, all of them are based on actual compromise reports I've seen at one point or another. I can muster up references for at least some of them if that's important.
Lassi, do you know of any online documents about this kind of thing we should read to avoid each of us making the same mistakes?
https://twiki.cern.ch/twiki/bin/view/CMS/DMWTServiceLevelAgreement#Code_audit has references you can start with. CERN security links have changed but start from https://security.web.cern.ch/security/recommendations/en/index.shtml and go from there. The SANS Top 25 Most Dangerous Software Errors (which goes to http://cwe.mitre.org/top25/ after you go through the top level documentation (please read that too though!)) has a lot of very concrete help and detail, examples etc. "The Six Dumbest Ideas in Computer Security" is very quick read, memorable, and I highly recommend it.
Also CERN runs courses for security programming. I think there's a separate course for web development.
BTW, in case you were wondering, the issue discussed here is listed on the top 25 most dangerous errors, it's CWE-209, "Information Exposure Through an Error Message".
lat: Let's say if you start with SANS Top25 list and when ever you provide code for deployment, you are able to say you've done real, honest attempt to eliminate all those problems, I'd be very very happy. Apart for a few random bugs, I think all my code audit reports were really about those top 25 issues...
sryu: Replying to [comment:24 lat]:
Concerning four-level deep javascript namespace, and the response: "It was for organizing the code and prevention for possible name collision in case the code is reused for constructing different pages". Basically this comes down to premature generalisation. I very strongly recommend sticking to minimal design that delivers the current set of agreed functionality (whatever that is in your case).
The reason for this is that many developers (been there myself) are extremely prone to the "world domination" plans. They start building the product, and have ideas about all the places where the code will be used - outside actual planned milestones. The initial design reflects this world domination plan, rather than what was actually asked to do. The devs have this "... and after this version we'll take over the world, this will replace all applications X on the market in next 6 months" mindset when building the program.
But in reality, the world domination is never achieved, and someone else ends up maintaining ridiculously over-engineered product. Decide what you plan to ship, and deliver that - and pretty much exactly that. You'll make life easier for everyone.
Yes I think you are right. I can reduce to 2 level, just organizing within the webtools. Thanks
sryu: Replying to [comment:25 lat]:
Regarding links, I meant anywhere where you generate "" in HTML. Overview.js is a good example (I didn't re-read all the code to remember where else I saw them, if anywhere).
Concerning "The links are not request manager only, they are from outside source (i.e localqueue). I will add the comments what the links will be. Or do you recommending somewhat different things?"
Yes I would like to see links restricted only to the expected valid targets. Ability to insert free links tends to be open to cross-site and forgery attacks, so the links should be validated against some short list of valid values, possibly using regexps.
More importantly it should be obvious from code generating whether it's generating an internal link, e.g. to some RequestManager API/web page call, or if it's generating an external link.
It is exernal links in a sense it is not from the same host. But all the link are from the WMCore/WorkQueue APIs. The links are not random but is not previously known either. The links gets from WorkQueue APIs and WMBS APIs which propagates it host link (in some cases adding suffix). I wander whether there still needs to be some verification in javascript end.
sryu: Replying to [comment:31 lat]:
I am very glad to see the new validation of input arguments. But I am slightly confused. There seems to be two different schemes used, one in WMCore/HTTPFrontEnd/RequestManager/ReqMgrBrowser.py (in method validation of some but not all parameteres) and another WMCore/HTTPFrontEnd/RequestManager/ReqMgrRESTModel.py (central validation routines; 'validation' not specified for all methods however).
I'd recommend picking one or the other, but in any case please make sure all arguments do actually have validation.
Validation of CMSSW versions is very weak. I recommend using a regexp anchored to validate the entire CMSSW version string, such as {{{^CMSSW(\d+){3}([a-z\d]+)?$}}}.
Hi Lassi, could you review the code on validation. I am not sure what is the best practice. I didn't use regex since specific directory contains only specific type of files. https://svnweb.cern.ch/trac/CMSDMWM/attachment/ticket/808/patch-series-sryu.patch Thanks
Validating URLs with just checking "http" prefix is very weak. I suggest restricting the URL considerably more.
rickw: Please Review
rickw: I'm using the following regular expressions for validation: self.identifier = re.compile('^([a-zA-Z0-9\s.-_]){1,100}$')
self.dataset = re.compile('^(/([a-zA-Z0-9.-_]){1,100}){3}$')
# CMSSW version
self.cmsswVersion = re.compile('^CMSSW(_\d+){3}(_[a-z\d]+)?$')
# valid couch URL
self.couchUrl = re.compile('^http://(([a-zA-Z0-9.-_]){1,100})(fnal.gov|cern.ch):5984')
The setup.py changes have been submitted to #579
swakef: I would suggest the validation regular expressions go into src/python/WMCore/Lexicon.py (or maybe elsewhere if they need to be called from javascript) so we all agree on what the valid forms are.
ps. Does the cmssw expression allow the for X_X_X_patch10? Not sure we have ever gone that high but... Actually, it does but it also allows 'CMSSW_3_10_10_pre111pre111'.
fyi: phedex has a restriction on site name of i think 10 characters which we may or may not want to duplicate.
The couch restriction doesn't allow user@host format - but maybe that's intentional?
metson: +1 on using Lexicon, if it's appropriate. I think user@host isn't desirable - localhost should be ~insecure (but heavily firewalled), remote will be certificate authenticated.
sfoulkes: Our development couch instances aren't going to be certificate authenticated.
metson: Why not? Once the Deploy/manage stuff is up and running it'll be easier to use that than to not, and that will pull in all the cert junk (I hope). Similarly, if it's development instance it shouldn't be public, so doesn't need user accounts.
sfoulkes: FNAL is requiring us to protect the couch server with a user account. So I have the option of either firewalling it so it can only be accessed from the machine it's running, or having user accounts...
Deploy ReqMgr application, initially just to dev instance. Ensure app developers can use it and work with it. Requires all the standard bits -- contact, accounts, ssh keys, packaging, deploy and manage scripts, monitoring, etc. Also requires update to WMCore python security module, ticket #472.