Securing Cloudman Galaxy master...

edwardsnj commented 7 years ago

I'm tasked with ensuring that the cloud-based Galaxy instance launched from launch.usegalaxy.org passes the same security scans a permanent server would. I'm using the Cluster with Galaxy option. I've dug around quite a lot, but haven't quite managed to figure out the special magic that would resolve the following issues. I hit most of these issues with the 15.x AMI/Galaxy version I think, but details below were uncovered on the 16.x AMI/Galaxy version.

start up in HTTPS mode from the get go (rather than logging into the admin user-interface over HTTP and then toggling ssl mode).
update the Ubuntu instance with the latest security patches. I run "sudo apt-get updates; sudo unattended-upgrades", which seems to be the recommended recipe, and reboot the cluster (via the cloudman/admin link).
1. it is unfortunate that there is no hook for a script to run earlier than any Galaxy services run which might save the reboot.
2. Unfortunately, apache2 is upgraded and configured to start at reboot - its links must be removed from /etc/rc*.d to be consistent with the "before" state of the instance.
3. Unfortunately, nginx is upgraded too, and as best I can make out, the upgraded version of ubuntu nginx does not provide the nginx upload module which is used by Galaxy and this makes it fail at startup. The error is "nginx: [emerg] unknown directive "upload_store" in /etc/nginx/..." in /var/log/cloudman/cm_boot.log.
If I stop ProFTPd, this change gets written to the s3 configuration bucket, no problem. However, after the reboot, the Galaxy service is never started by the supervisor, since it is waiting for ProFTPd to be started (and it never is, because it was stopped).
Where/how can I manipulate the ports opened by the security group? What is on the other end of each of these ports - there seems like a lot? I'd like to shut down everything other than HTTPS and SSH (at least to the outside world).
ssh access is permitted with either the password /or/ the key-pair - can I turn off password only access and require the key-pair?
(Not strictly a security issue, but observed when rebooting...) a new galaxyIndices volume gets created with each reboot.

Thanks,

-- n

afgane commented 7 years ago

We'll try and work through these and keep updating the issue here. I'm a bit strapped these days prepping for the GAMe conference next month so it may not be getting done as quickly as I'd prefer though.

As some of these issues are being worked on, I'll keep updating bucket cloudman-test with the dev code so if you'd like to try what's been done thus far, you can do so by creating a new cluster and supplying that string in the Default bucket field on https://launch.usegalaxy.org. For now, I've added the ability to automatically turn on https. You can try it out by supplying use_ssl: True in the Extra User-Data field.

BTW, there is a newer image available on the launcher from the beginning of November. I just never announced that release and it's gotten superseded now with the Galaxy 16.10 but it's available from the Flavors dropdown as image 16.11. Also, starting with that release, we're getting away from having an indices volume altogether (see https://github.com/galaxyproject/cloudman/issues/60 for a bit more on that) so number 6 in your list won't be an issue any more.

afgane commented 7 years ago

For number 4:

Where/how can I manipulate the ports opened by the security group? What is on the other end of each of these ports - there seems like a lot? I'd like to shut down everything other than HTTPS and SSH (at least to the outside world).

You can change the list of opened ports directly from the (AWS) cloud console. Under security groups, choose CloudMan security group and edit the desired fields. To help you decide what to keep, the reasons for opened ports are listed here: https://github.com/galaxyproject/bioblend/blob/master/bioblend/cloudman/launch.py#L266 Also, the group rule is required for instances in the cluster to be able to talk to each other.

edwardsnj commented 7 years ago

OK, I've tested the use_ssl: True User Data option using the default bucket set to "cloudman-test" on the 16.11 flavor, it works very nicely, even surviving a reboot, which seemed unreliable before. Awesome! Thanks!

Thanks for the port reference - I'll check it tomorrow. I have been experimenting with turning off all the ports (for 0.0.0.0) except for 22, 80, 443, though I haven't thoroughly checked how this affects running jobs within the cluster. I understand the use of the "security group origin" to help the workers and master communicate - I'm only concerned about outside access. Do the workers get the same security group? How about a user parameter to set the security group to a pre-existing one? (like with key-pair etc.) On the other hand, maybe I should just use the subnet-id option - I think this can do firewall type filtering. Alternatively, I could modify the code in the bucket - where is this done?

With respect to number 5, today I experimented with changing the /etc/ssh/sshd_config line: PasswordAuthentication yes to PasswordAuthentication no in order to disable password based login (public-key only) and then restarting the sshd daemon. How do I build this procedure into the boot-up/start-up scripts or set this file's contents from the bucket?

Thanks much!

n

edwardsnj commented 7 years ago

Oh, and one more thing I noticed after reboot is that I had to "approve" self-signed SSL certificates again (new ones?). I'm suspicious that new certificates are being generated with the reboot, which is probably undesirable (although I can live with it...)

-- n

edwardsnj commented 7 years ago

Shutting down the cluster and restarting it from launch.usegalaxy.org does not retain the ssl behavior - in general I'm a bit unclear what aspects of the Extra User Data is persisted in the S3 bucket. Some User Data options clearly change the underlying filesystem (admin_users, for example), so it doesn't need to be re-specified, but other options?

Thanks,

--n

afgane commented 7 years ago

The SSL cert was being regenerated but I fixed that now. I've also made the the SSL setting persist across cluster shutdowns. Note that this will persist now even if you toggle it manually from the Admin page vs. using the user data options. The list of configs that get stored as persistent are set here: https://github.com/galaxyproject/cloudman/blob/master/cm/master.py#L2557

As far as the password-based auth. We'd actually need to build a new image to make that natively configurable because it's baked there: https://github.com/galaxyproject/ansible-cloudman-image/blob/master/files/cm_autorun.py#L366 In the mean time, you can specify the following user data:

master_prestart_commands:
  - sed -i 's/^PasswordAuthentication .*/PasswordAuthentication no/' /etc/ssh/sshd_config
  - /usr/sbin/service ssh reload

This also gets at your initial question 2.i: you can put any shell command there and it will run before any application services get started by CloudMan. I also made these persist across shutdowns.

Finally, for the security groups - this one is tricky. The launcher (via bioblend) is what ensures the ports are open so the started services can operate properly. So it would be necessary to modify those each time an instance is launched. If you add more rules to an existing group, those will remain open but if you close some, the launcher will reopen the minimal set. You could run your own launcher and modify the rules there...

afgane commented 7 years ago

@edwardsnj can you give it a try now and see how things are looking from your perspective?

The only thing I see that's left is the nginx update, but without building a new package with the required dependencies, I don't see a resolution to that one (one reason for not having done the migration to Ubuntu 16.04 is because of the need to upgrade that package, so it keeps getting pushed off).

nuwang commented 7 years ago

@afgane I'm wondering whether we can switch to dynamic modules. Nginx supports this from 1.9.11 onwards. https://www.nginx.com/blog/dynamic-modules-nginx-1-9-11/

There's also a PR for the upload module to add dynamic module support: https://github.com/vkholodkov/nginx-upload-module/pull/77

It looks like we should then be able to install the stock nginx, compile the dynamic module separately, and do a load_module in the nginx config: https://www.nginx.com/resources/wiki/extending/converting/

edwardsnj commented 7 years ago

Thanks for all of your work. I'll give it a whirl this afternoon or tonight.

edwardsnj commented 7 years ago

I can confirm that SSL seems to persist through reboots and new launches of saved clusters and the certificate seems to be preserved across reboots. Preserving the certificate across new launches of saved clusters is moot since the new instance gets a new IP address / public name to associate with the preserved certificate. Firefox seems rather unhappy with this, but Chrome doesn't complain (although a new exception is required for the self-signed certificate).
Turning off ProFTPd no longer causes problems with reboot and/or new launches of saved clusters.
master_prestart_commands above is turning off password login as needed.
I'm still looking into the security group issues - I'm wondering if I can execute a command at boot-up via master_prestart_commands to reset these.

When I restart a saved cluster, I have to re-enter the Key-Pair and the Image details (otherwise they end up at the defaults). Is this the expected bahavior? None of the other options on the form seem to be needed for a saved cluster.

Thanks...

galaxyproject / cloudman

Securing Cloudman Galaxy master... #67