Closed ghost closed 12 years ago
lat: Initial set of deployment review comments are on [https://wave.google.com/wave/waveref/googlewave.com/w+XsBuMPFwb google wave].
lat: After various rounds of deployments comments on mailing list, currently outstanding deployment issues are:
lat: Review note on DAS packaging: would prefer %post to do even less, in particular would prefer that servers don't ship with a running configurations in the RPM. An example is ok, but don't create the default configuration the server will use and run with. It's too risky, if there's a mistake one might start a highly insecure server.
lat: Review note on DAS bin directory: suggest to rename all programs consistently with or without suffix. I'd prefer without suffix and it seems to be the more common here too, so perhaps the suffixes can be removed.
lat: Review note on DAS {{{bin}}} directory: Use syntax {{{foo ${1+"$@"} }}} not just {{{foo "$@"}}}. The former handles no arguments correctly, the latter has problems if user doesn't supply any command line arguments at all.
lat: Review note on DAS {{{bin}}} directory: Use {{{#!/bin/sh}}} instead of {{{#!/usr/bin/env bash}}}. I don't think the scripts use any features that are not in POSIX standard. And even if they did, they can probably be trivially expressed in POSIX-compatible subset.
lat: Review note on DAS {{{bin}}} directory: start-up scripts should be folded into {{{manage}}}. We very much prefer to see everything inlined directly into the {{{manage}}} script without several layers of indirection, for simplicity, comprehension and transparency.
lat: Review note on DAS {{{bin}}} directory: I don't know if {{{das_test}}} is meant to be used by anyone other than developers, but in the general spirit of not holding configuration in the software area, would suggest to take {{{das.cfg}}} from outside {{{$DAS_ROOT}}}. Would also suggest to use python configuration file, not .cfg style.
lat: Review note on DAS {{{bin}}} directory: {{{dassh}}} appears to touch $HOME, which seems suspicious. How exactly does DAS use ipython? Is this going to appear on server side, and if so, why? I did notice we had /home/cmsweb/.ipython appear at one point.
lat: Review note on DAS {{{bin}}} directory: suggest to remove non-production binaries from default project bin directory when the RPM is built. In general would prefer if test artifacts didn't make it to the running servers at all. I have no objection to having them in SVN mixed up, as long as the build recipe can put the bits in the right place in the RPM (or omit them, as the case may be).
lat: Review note on DAS: nice job with unit tests.
lat: Review note on DAS: xwho is obsolete and probably should be converted to use xldap.cern.ch. See {{{phonebook}}} command on any CERN linux system. It's just a perl script which makes a simple LDAP query to xldap.cern.ch. Note that xldap.cern.ch is only available within CERN, (testing) outside CERN you'll need (SSL + password authenticated) connection to ldap.cern.ch instead.
lat: Review note on DAS: would suggest to generally consider slight flattening of name space. "from DAS.utils.utils import foo" seems a little redundant :-)
lat: Review note on DAS: generally like the use of yaml for describing schema, and how it is linked to service providers - good job.
lat: Review note on DAS: curious about this coding pattern:
{{{
try: jsondict = json.loads(data) except: jsondict = eval(data) }}}
Use of {{{eval()}}}, especially without any dictionary restrictions - eval global and local dictionary options - seems very dangerous. It would seem that if someone manages to compromise some other CMS application, they can also contaminate DAS because it will eval the data produced by those sources. Admittedly I am not sure where the "data" can originate here. But if the data is already sanitised, the construct seems unnecessary, so I am assuming it originates from untrusted outside source.
What's the reason falling back on eval? Is the data python-esque json, but not completely legal json? Can we not fix the upstream data source? The above example was from SiteDB, so for that we should just fix the output...
lat: Review note on DAS: suggest to rename services/runsum/run_summary.py to something/cern_sso_auth.py.
lat: Review note on DAS: services/runsum/runsum_service.py, runsum_keys() should not be referring to configuration files under $DAS_ROOT. Parametre file locations should come exclusively from server configuration, not picked up via environment, or RPMs.
lat: Review note on DAS: general comment, would find expression {{{if not isinstance(x, dict)}}} style more readable than {{{if type(x) is not types.DictType}}} style.
lat: Review note on DAS: services/runsum/runsum_service.py, {{{parser()}}} method is not protected against uninitialised in {{{root.clear()}}} at the end of the function. Also not obvious why the function has to call {{{source.close()}}}, seems like that should be automatic, either in caller, or just handled automatically by it going out of scope.
lat: Review note on DAS: overview plotfairy version, session arguments are unnecessary and can be omitted.
lat: Review note on DAS: {{{encode4admin}}} seems unused. I'd prefer it was removed entirely, just to avoid it accidentally getting used somewhere.
lat: Review note on DAS: I have not yet read all the code, but have a general comment in that it's somewhat unclear where input validation occurs, especially on data loaded from the upstream sources in services. There are some int(x) type conversions which will naturally throw, but I didn't see where types were specified, and how DAS handles situation where query requests say numerical comparison against mixed / non-numerical types.
lat: Review note on DAS: DAS seems pretty liberal on using various ports, especially for MongoDB. We should ensure DAS uses port range allocated to it.
lat: Review note on DAS: {{{das_top.tmpl}}} pulls yui from yahoo's servers - seems like das should serve it itself.
lat: Review note on DAS: {{{das_table.tmpl}}} defines variable {{{YAHOO.example.DynamicData}}} - seems like it probably should CMS/DAS related name instead?
lat: Review note on DAS: {{{das_table.tmpl}}} (among other places) generates URLs which don't use quoting for arguments. It's possible the arguments don't require it, but would recommend that as a general style, anything originating from possible user source goes via encodeURIComponent() or alike.
lat: Review note on DAS: {{{das_services.tmpl}}} - among other places - generates HTML and URI components from values where it's not obvious whether quoting is required.
I assume for {{{das_services}}} the text cannot contain unsafe characters, and in particular, DAS cannot be induced to generate a XSS attack here by injecting malicious content, but nevertheless I would suggest the page contents are automatically quoted.
Not sure if the template engine automatically does that, but given that there aren't any distinctive directives for different output contexts (HTML vs. URI inside A tag), I assume no quoting is happening by default.
lat: Following [comment:30 comment 30], {{{das_query_info}}} seems vulnerable to XSS and other injection attacks. You should be careful that if someone constructs malicious queries, you don't inject them as-is to the admin/expert page, which may get the admin's browser do all kinds of funny things! This has been one classic way several sites got compromised...
lat: Further to [comment:30 comment 30], it seems many templates and bits from code like {{{DAS/web/*.py}}} may pass in parameters from client unvalidated all the way straight to the template and back to the client.
So it seems there may be possibility for various kinds of XSS and injection attacks. I am definitely not positive about this. It's possible there's a central validation somewhere, and quoting in the template engine, but it's not very obvious from reading this code that that will happen.
If that's the case I would definitely appreciate localising the knowledge to the functions (e.g. {{{records()}}} in {{{das_web.py}}}), so reading the code it's essentially obvious what sort of protection is in place. I'd definitely recommend against splitting any protections to another location, away from the code that needs them.
lat: Review note on DAS: {{{das_jsontable.tmpl}}} seems like standard YUI boilerplate template, complete with tracking bug. This seems like a testing thing that belongs somewhere else. As far as I can tell it's unused, so perhaps can be removed, or put waiting in dev patch queue until it's needed?
lat: Review note on DAS: authorisation to 'expert' page appears to be protected by 'admin.dns' mongodb database.
It is good DAS uses X509 certificate access status to make the check, but it's unclear to me how this database gets populated and where, and whether it's possible to subvert it. Is it somehow related to the admins in SiteDB?
However more generally, DAS should use standard CMS authentication and authorisation system to protect access. Even if it uses information from SiteDB, it shouldn't use an alternate path to that information, and alternate code to make the checks. It should use standard method decorators and other agreed conventions.
lat: Review note on DAS: accessing {{{/das/help}}} URL has two issues. First it has a problem generating the {{{das_help}}} page because {{{services}}} is missing. Second is that it spits out a traceback to the client browser, which means the server isn't properly configured to hide the errors.
lat: Further to [comment:30 comment 30], the {{{das_error.tmpl}}} does give an example how to perform quoting. So it does seem the vast majority of other pages, both python code and template, need to be carefully revisited to make sure all data, whether from user or another database source, is carefully sanitised before putting it into output.
lat: Review note on DAS: the various method decorators in {{{DAS/web/tools.py}}} seem to require review. I would suggest to carefully go through all of them, and all their uses in the code. I would suggest to remove all decorators which aren't strictly required.
Some of the uses of the decorators seem even more curious than the decorators themselves. The wrap2das* methods seem to have very little in way of validation, and I would really like to see some comments to whether they can be used to get the server reflect arbitrary data back to the client. You really should avoid any clever redirection (including internal redirection) and wrapping tools, especially against any possibly tainted data that originated from user.
lat: Further to [comment:37 comment 37], I did verify the wrap2das* methods are accessible to the clients, even though they appear to be used internally as well. The internal calls pass real data, so it seems that sufficiently clever call to wrap2das* method with suitably complex GET or POST might be possible to use as a reflector. If true this again opens window for XSS and XSRF attacks.
lat: Review note on DAS: {{{DAS/web/das_webmanager.py}}}'s {{{css()}}} and {{{js()}}} methods have duplicate logic for locating files. The {{{check_scripts()}}} should probably return the decision it made on file path. Also related to this, I am not sure modifying a list while iterating it is safe (in {{{check_scripts()}}}, and I am not sure the code is properly protected against duplicated invalid entries.
I would suggest to simplify the code so it makes the path decisions in exactly one location, by simple iteration and producing a new output, rather than trying to modify the data structure in-place.
lat: Review note on DAS: I suggest to centralise URL scraping. It seems several places in code use {{{urllib2_request}}} which seems to perform an appropriate amount of encoding for protection, but it's not consistently used everywhere. For general understanding of the code it would be helpful if there were as few scraping / internal http request patterns as possible.
lat: Review note on DAS: the {{{status()}}} method on {{{DAS/web/das_web.py}}} seems to pass any request through unmodified to the underlying data store.
lat: Review note on DAS: {{{DAS/web/das_server.py}}} defaults to writing pid files to /tmp. I would suggest to default to not writing a pid file at all if one wasn't requested, rather the defaulting to "somewhere".
lat: Review note on DAS: the default socket listen queue should probably default to 100+, especially as we'd expect this to be a high-traffic server.
lat: Review note on DAS: the server should default to production mode, turning tracebacks to client off, and not reloading code whenever the files change. Not sure if that's handled somewhere other than {{{DAS/web/das_server.py}}}. In fact would like to see this use more of WMCore...
lat: Further to [comment:44 comment 44] indeed it seems there is very little use of WMCore. I suggest to reconsider reusing at least the base server, possibly quite a bit more from WMCore now that it is available - I appreciate it might not have been in place when this code got started.
lat: Review note on DAS: like many other places, {{{DAS/web/das_expert.py}}} seems to reflect caller arguments directly back to the client in URLs and otherwise. Seems like more aggressive input and output sanitisation is in order.
lat: Review note on DAS: {{{DAS/web/das_expert.py}}} also seems to use the input parameters completely unchecked. Granted the code is protected with DN check, and some of the mishaps would likely just produce python KeyError, but would still prefer to see a style that rigorously untaints all client input before using it. (Using WMCore should help here, and would probably also help verify the HTTP request method is appropriate to the method call.)
lat: Review note on DAS: as previously discussed would like to see {{{das_doc.py}}} dropped.
lat: Review note on DAS: suggest {{{DASLogger}}} to default to use {{{rotatelogs}}} now that we've made that available as a stand-alone utility. Alternatively perhaps support pipe output syntax. Preferably in any case shared with WMCore as much as possible. I do see it uses python internal logging facility rotation ability, but I'd suggest to standardise on rotatelogs instead.
lat: Stopping this round of review here. This leaves DAS core and analytics for another round of code audit / review.
lat: Marking this work item completed for HG1009 milestone, and assigning ticket to DAS project, for a milestone and priority to be determined together with other project managers. It would probably be best to split this ticket to individual items. I am certainly available to discuss how to handle review notes and to make sure they don't get lost.
lat: == Summary of my findings ==
A lot of the issues mentioned above are low-level technical points. While I did not review the core and analytics code, my overall impression of DAS code is very favourable. I compliment in particular the choice of YAML for describing the schema, queries and notation, and its mapping to service python code. Overall I find the code simple, elegant and understandable, and very approachable. Well done on the core technical choices.
The greatest weakness I see is almost complete lack of input and output sanitisation, but from DAS clients and validation of upstream data. I am a little concerned this might be quite vulnerable to XSS, XSRF and injection attacks, and very vulnerable to compromised upstream data sources. I did not reason deeply enough about the code to understand if it is possible for clients to inject code to the map reduce algorithms, but I'd consider that possible.
I would also very much encourage more code sharing from current WMCore server. Many trivial issues seem best addressed by hooking up to WMCore server and its validation support - not by recoding the same functionality into DAS.
Depending on what sort of schedule you'd anticipate for fixing some of the handling of tainted data, I might feel a bit more comfortable if we put this behind mandatory X509 certificate authentication.
valya: Replying to [comment:6 lat]:
Review note on DAS packaging: would prefer %post to do even less, in particular would prefer that servers don't ship with a running configurations in the RPM. An example is ok, but don't create the default configuration the server will use and run with. It's too risky, if there's a mistake one might start a highly insecure server.
Done. The %post section has been reviewed and cleaned up.