Closed xtophs closed 6 years ago
This will have some amount of overlap with #223 -- I will focus static analysis-related concerns here and look at operational issues in #223.
Accepted, pending the setup of a functional development instance, per @c-w
This task is blocked on https://github.com/CatalystCode/project-fortis-pipeline/issues/176 and https://github.com/CatalystCode/project-fortis-pipeline/issues/243 -- let's first implement at least basic auth and ssl before we take a wrench to this.
This task is now unblocked. Working on setting up a security review test site.
Test site is now set up:
Authorized users for test site:
Potential SQL injection in featureService found by @stewartadam: https://github.com/CatalystCode/featureService/blob/b632d9cf9966b1ac68dee66a9cc634de877ef5d3/services/features.js#L143-L145
PR to fix it: https://github.com/CatalystCode/featureService/pull/22
Some more pointers requested by @stewartadam about our security layer and the services we call:
In the frontend, the authentication is handled by msal.js, specifically take a look at the interactions with the adApplication object in routes/AppPage.js which we use to log in and acquire a token from Active Directory v2.
Once we have the AAD token, we pass it to every GraphQL request to the backend. This happens in the fetchGqlData function.
In the backend, we're using passport.js to authenticate the frontend tokens. All the setup for this happens in auth.js. We use the token to look up the identify/email of the requesting user. We then check the email against the users table in Cassandra in the requiresRole decorator.
The requiresRole decorator is wrapped around each resolver that the GraphQL server uses to specify the level of permissions that a call to the endpoint requires. E.g. user-level permissions to view data or admin-level permissions to mutate configuration.
We're also calling from both the frontend and the backend the featureService which is a centralized service based on Postgres that only supports HTTP currently. The frontend which may be hosted on HTTPS also needs to call this service so we're proxying the calls via the GraphQL server in server.js to avoid same-origin issues. So as far as the frontend is concerned, it only knows about the GraphQL server and doesn't call any other hosts.
Note that the proxy endpoint is not protected by auth since the proxied service is entirely public anyways. Beyond /proxy/featureservice
, there is one more endpoint that's not protected by auth: /healthcheck
. Everything else (graphql) is hosted at /api
and is protected by auth.
Some more pointers for @stewartadam about our production setup:
We're using the Deploy To Azure button to auto-generate a deployment UI for our ARM template azuredeploy.json.
The ARM template sets up a bunch of resources and then creates a VM to run the custom script fortis-deploy.sh which finishes the deployment. The VM is accessible via a SSH key configured at deployment time.
The deployment script sets up all parts of the Fortis infrastructure. The frontend gets built to a static JS/HTML/CSS bundle by install-fortis-interfaces.sh and uploaded to a public blob storage.
The backend gets deployed to the AKS cluster as a simple service by install-fortis-services.sh. The image for this service gets built by TravisCI on Github tags and uploaded to DockerHub by publish.sh.
Spark gets set up in the k8s cluster via Helm by install-spark.sh. The chart for this deployment is configured to not expose any ports for the master/worker/cluster UIs.
Cassandra gets set up in the k8s cluster via Helm by install-cassandra.sh. The chart is configured to not expose any IPs outside of the cluster.
HTTPS gets configured via kube-lego by install-fortis-services.sh. Both bring-your-own-certificate and LetsEncrypt are offered as options. There should only be one ingress into the cluster which points to the project-fortis-services backend (GraphQL server).
Some comments from Alexandre Gattiker:
On the security review test site: I was able to Edit the Stream parameters and view the twitter secret I was able to change the status of the twitter Stream from enabled to disabled (sorry for this, I didn’t > change it intentionally and immediately reset it to enabled)
As we move forward and close more holes in the API, we should ensure we're handling statement generation in a uniformly "safe" way. For example, we still have the function at https://github.com/CatalystCode/featureService/blob/4bc15ac5320574b04fb6eb82564648f3af149730/services/features.js#L128 which concatenates additional SQL onto the statement to handle filtering, which could potentially be a point of intrusion later on even though we're sanitizing queries in the execution engine, because the templated SQL statement's generation is procedural.
Additionally, the changes here expose Postgres error messages to the JSON which we should probably suppress. Case in point, if one hits the bbox endpoint with non-numeric values the pg module bubbles up its error trace to the JSON:
e.g. /proxy/featureservice/features/bbox/a/b/c/d yields: {"error":{"name":"error","length":184,"severity":"ERROR","code":"XX000","file":"lwgeom_pg.c","line":"162","routine":"pg_error"}}
This kind of verbosity on a production server enables future SQL injections to be simpler because there's better feedback from the originating server on what exactly failed, when we could just as easily return a 500 error code or similar (as any use case that would select from an invalid URL already isn't using the frontend libraries correctly, so it's not something we should need to cover for in a developer-friendly manner. At the very least, being able to toggle an "isProductionServer" flag to decide whether or not to display these errors would probably be pretty valuable for customers.
The error messages are hidden as of https://github.com/CatalystCode/featureService/pull/23:
As for the procedural generation of SQL-where clauses, I'm not sure there's a cost-effective way of fixing that since it permeates the entire featureService code base. Given that the only types of statements we're adding are essentially constants like AND lower(split_part(id, '-', 1)) = lower($1)
where the only variable is the index of the variable placeholder $1
, shouldn't we be safe enough here?
One attack vector is the featureService. It's a single point of failure since all Fortis deployments are using the same deployed service. The service is also publicly accessible from the web although it could just as well run inside of the k8s cluster.
After confirming with @anthturner, I'm working on dockerizing the featureService and moving the features-database to Azure-Postgres so that we can remove this single point of failure and deploy the featureService inside of the Fortis k8s cluster. The work was merged in https://github.com/CatalystCode/featureService/pull/24. The dockerized featureService is now also being deployed to the Fortis k8s cluster in install-featureservice.sh.
We're now authenticating calls to the featureservice. The architecture regarding the featureservice now looks as follows:
Got the go-ahead from @anthturner and @stewartadam. Resolving. Thanks so much for the security review!
Let's make sure we sanity check deployment, operations, UI code, etc. since fortis is storing sensitive information like twitter / fb account credentials, lists of monitored accounts that (or modifying them) may be interesting to certain people.