carpentries / amy

A web-based workshop administration application built using Django.
https://amy.carpentries.org
MIT License
114 stars 72 forks source link

Hosting alternatives #1164

Closed chrismedrela closed 6 years ago

chrismedrela commented 7 years ago

Right now, we're using VPS on Rackspace on production. The VPS was configured from scratch by @pbanaszkiewicz. This has a lot of disadvantages:

  1. Deployment (and release) procedure is pretty complicated. This results in few big milestone instead of frequent small releases. The consequences is that we lose traction and we postpone releasing v1.10 indefinitely.
  2. As far as infrastructure is concerned, our bus factor is zero -- I couldn't configure a new VPS server for AMY as Piotr did (I'm not sure about @narayanaditya95).
  3. amy-dev has completely different configuration (we just use ./manage.py runserver), so we cannot test i.e. consequences of deployment of a new version or restoring a backup. The same applies to our local configurations.
  4. We have no procedure for restoring from a backup.
  5. We make backups, but we didn't test if our backups are correct and if we actually can restore from a backup.
  6. A lot of configuration was done manually, but we're not DevOps. It'd be better (safer) to use tools provided by a PaaS.
  7. Using PaaS makes it easier to move to PostgreSQL.

I'd like to research alternatives. Important factors are:

  1. Supports Python 3.
  2. Possibly something that requires minimal customization and is already configured for Django + Sqlite3 (so PaaS is better then IaaS).
  3. Something that has easy deployment and restore from backup procedures (possibly one shell command).
  4. Something that can automatically make backups as we already do.
  5. Free or cheap or willing to make an exception for us.
  6. Compatible with our data privacy policy (issue raised by @rgaiacs).

@pbanaszkiewicz what is our current approach to making backups? Where do they live? What does trigger making regular backups (cron job?)? How do we actually backup (just copy .sqlite3 file?)?

chrismedrela commented 7 years ago

Alternatives:

rgaiacs commented 7 years ago

@chrismedrela Thanks for all your contribution and for raising this issue. About the alternatives, we need to be careful to check that they are compatible with our data privacy policy.

chrismedrela commented 7 years ago

About Heroku:

chrismedrela commented 7 years ago

@rgaiacs, where can I find our data privacy policy?

rgaiacs commented 7 years ago

https://software-carpentry.org/privacy/

Disclosure of personal information

We will not disclose any information we collect to any third party unless it is:

  1. Specifically authorized by you; or
  2. Necessary to perform the functions requested by you (and only to the extent absolutely necessary to perform the function); or
  3. Required to check for fraud or other misuse of SCF resources; or
  4. Required by law.

In particular, we may release the information we collect to third parties when we believe it is appropriate to comply with the law, to enforce our legal rights, to protect the rights and safety of others, or to assist with industry efforts to control fraud, spam or other undesirable conduct.

If the SCF contracts with a third party to provide a particular service, we may release the information we collect to that third party, provided that the third party has agreed to use at least the same level of privacy protections described in this Privacy Policy, and is permitted to use the information only for the purpose of providing services to us.

pbanaszkiewicz commented 7 years ago

@chrismedrela backups are made roughly every 30 or 60 minutes by alternating servers (server A every hour, server B every hour but shifted 30 minutes).

Backups are only DB files copies.

@chrismedrela I agree with your points, but this is @jduckles decision to make. There was supposed to be a devop (even part-time) but I don't think there's one.

Personally I think the best would be to have PaaS at Rackspace since we're already using them and for free. I don't know if Rackspace provides this kind of service, though.

aa-dit-yuh commented 7 years ago

FWIW, I am more in favour of staying with a IaaS solution (more flexibility). Most of the issues raised by @chrismedrela can be resolved if we make the deployment pipeline automated using tools like Ansible.

jduckles commented 7 years ago

Thank you all for considering alternatives, as @pbanaszkiewicz says, getting hosting for free right now is an important consideration.

Whatever system to choose, @chrismedrela brings up an excellent point, it is probably worth it to document the process for installing a new server, and at the same time we can likely create something like a Dockerfile which we'd need if we move to heroku or anywhere really. Most of these services are beginning to use Docker as their underlying infrastructure, so that is a good stable target for us to move toward.

It seems that to do this most generally we need to:

  1. Make the move to PostgreSQL
  2. Document the dependencies of AMY (pretty normal I think)
  3. Document other things such as SSL key installation and the like that AMY depends on

Once we've done that we have have the option to move pretty much anywhere and quickly should we need to.

Did I miss anything?

pbanaszkiewicz commented 7 years ago

:+1: for Dockerfile, because this is a pretty lightweight solution for running PostgreSQL for development.

With the switch to PostgreSQL we have to rethink backups (is it DB only? Or maybe the whole server, including e.g. generated certificates).

maneesha commented 7 years ago

Switching to Postgres also means rethinking off-line access. Right now it's easy to scp a sqlite file -- how will we access the Postgres db?

chrismedrela commented 7 years ago

Thank you all for your contribution. I understand your concerns. I played with Heroku and even run AMY with fake data there to better understand what we can and cannot expect from a PaaS provider. See #1167 PR to see what we would need to change if we wanted to move to Heroku.

@pbanaszkiewicz

Personally I think the best would be to have PaaS at Rackspace since we're already using them and for free. I don't know if Rackspace provides this kind of service, though.

unfortunately Rackspace doesn't provide PaaS.

@narayanaditya95

FWIW, I am more in favour of staying with a IaaS solution (more flexibility). Most of the issues raised by @chrismedrela can be resolved if we make the deployment pipeline automated using tools like Ansible.

Yes, we could stay with IaaS and develop our own scripts for deployments, backups etc., but it'd take much more time (especially that we're not DevOps) and would be more insecure than using PaaS. The benefit of using PaaS over IaaS is that we outsource most of DevOp work.

I agree that IaaS gives more flexibility. But AMY is a very typical Django app. We even don't use filesystem as data storage yet.

@jduckles

and at the same time we can likely create something like a Dockerfile which we'd need if we move to heroku or anywhere really.

I don't think Dockerfile is necessary. All dependencies (including gunicorn and PostgreSQL) can be enumerated in requirements.txt. I didn't have to write any Dockerfile to run AMY on Heroku. I believe that other PaaS works in the same way.

  1. Make the move to PostgreSQL

We don't have to actually use PostgreSQL on production, but we need to make sure that we could switch to PostgreSQL easily and quickly. So I proposed in #1168 to run tests on both SQLite3 and PostgreSQL in Travis CI (as well as to fix existing bugs that prevent us from switching to PostgreSQL). As long as we use VPS on Rackspace (or any other IaaS), I'd like not to move to PostgreSQL because of the hassle of configuring automatic backups, changes in deployment procedure etc.

  1. Document the dependencies of AMY (pretty normal I think)

It's already done in requirements.txt.

  1. Document other things such as SSL key installation and the like that AMY depends on

I've started a discussion what exactly we need to document in #1171.

@pbanaszkiewicz

:+1: for Dockerfile, because this is a pretty lightweight solution for running PostgreSQL for development.

Installing PostgreSQL in development is as easy as runningpip install psycopg2 plus one command line to configure PostgreSQL (sorry, I don't remember now what exactly the command is).

With the switch to PostgreSQL we have to rethink backups (is it DB only? Or maybe the whole server, including e.g. generated certificates).

As long as all data lives in database, it's enough to backup only db. And we decided that certificates will be generated on the fly and cached on a filesystem, not stored there.

@maneesha

Switching to Postgres also means rethinking off-line access. Right now it's easy to scp a sqlite file -- how will we access the Postgres db?

Django has dumpdata and loaddata commands that are backend agnostic. For example, you can run dumpdata on production with PostgreSQL and run loaddata on your local machine with SQLite3.

pbanaszkiewicz commented 6 years ago

The latest update on this issue: we continue to run on Rackspace, on a VPS, but we need to move forward to newer Ubuntu.

pbanaszkiewicz commented 6 years ago

I'm going to start a new issue for migration to a new VPS server.