Closed chrismedrela closed 6 years ago
Alternatives:
@chrismedrela Thanks for all your contribution and for raising this issue. About the alternatives, we need to be careful to check that they are compatible with our data privacy policy.
About Heroku:
@rgaiacs, where can I find our data privacy policy?
https://software-carpentry.org/privacy/
Disclosure of personal information
We will not disclose any information we collect to any third party unless it is:
- Specifically authorized by you; or
- Necessary to perform the functions requested by you (and only to the extent absolutely necessary to perform the function); or
- Required to check for fraud or other misuse of SCF resources; or
- Required by law.
In particular, we may release the information we collect to third parties when we believe it is appropriate to comply with the law, to enforce our legal rights, to protect the rights and safety of others, or to assist with industry efforts to control fraud, spam or other undesirable conduct.
If the SCF contracts with a third party to provide a particular service, we may release the information we collect to that third party, provided that the third party has agreed to use at least the same level of privacy protections described in this Privacy Policy, and is permitted to use the information only for the purpose of providing services to us.
@chrismedrela backups are made roughly every 30 or 60 minutes by alternating servers (server A every hour, server B every hour but shifted 30 minutes).
Backups are only DB files copies.
@chrismedrela I agree with your points, but this is @jduckles decision to make. There was supposed to be a devop (even part-time) but I don't think there's one.
Personally I think the best would be to have PaaS at Rackspace since we're already using them and for free. I don't know if Rackspace provides this kind of service, though.
FWIW, I am more in favour of staying with a IaaS solution (more flexibility). Most of the issues raised by @chrismedrela can be resolved if we make the deployment pipeline automated using tools like Ansible.
Thank you all for considering alternatives, as @pbanaszkiewicz says, getting hosting for free right now is an important consideration.
Whatever system to choose, @chrismedrela brings up an excellent point, it is probably worth it to document the process for installing a new server, and at the same time we can likely create something like a Dockerfile
which we'd need if we move to heroku or anywhere really. Most of these services are beginning to use Docker as their underlying infrastructure, so that is a good stable target for us to move toward.
It seems that to do this most generally we need to:
Once we've done that we have have the option to move pretty much anywhere and quickly should we need to.
Did I miss anything?
:+1: for Dockerfile, because this is a pretty lightweight solution for running PostgreSQL for development.
With the switch to PostgreSQL we have to rethink backups (is it DB only? Or maybe the whole server, including e.g. generated certificates).
Switching to Postgres also means rethinking off-line access. Right now it's easy to scp a sqlite file -- how will we access the Postgres db?
Thank you all for your contribution. I understand your concerns. I played with Heroku and even run AMY with fake data there to better understand what we can and cannot expect from a PaaS provider. See #1167 PR to see what we would need to change if we wanted to move to Heroku.
@pbanaszkiewicz
Personally I think the best would be to have PaaS at Rackspace since we're already using them and for free. I don't know if Rackspace provides this kind of service, though.
unfortunately Rackspace doesn't provide PaaS.
@narayanaditya95
FWIW, I am more in favour of staying with a IaaS solution (more flexibility). Most of the issues raised by @chrismedrela can be resolved if we make the deployment pipeline automated using tools like Ansible.
Yes, we could stay with IaaS and develop our own scripts for deployments, backups etc., but it'd take much more time (especially that we're not DevOps) and would be more insecure than using PaaS. The benefit of using PaaS over IaaS is that we outsource most of DevOp work.
I agree that IaaS gives more flexibility. But AMY is a very typical Django app. We even don't use filesystem as data storage yet.
@jduckles
and at the same time we can likely create something like a Dockerfile which we'd need if we move to heroku or anywhere really.
I don't think Dockerfile is necessary. All dependencies (including gunicorn and PostgreSQL) can be enumerated in requirements.txt
. I didn't have to write any Dockerfile to run AMY on Heroku. I believe that other PaaS works in the same way.
- Make the move to PostgreSQL
We don't have to actually use PostgreSQL on production, but we need to make sure that we could switch to PostgreSQL easily and quickly. So I proposed in #1168 to run tests on both SQLite3 and PostgreSQL in Travis CI (as well as to fix existing bugs that prevent us from switching to PostgreSQL). As long as we use VPS on Rackspace (or any other IaaS), I'd like not to move to PostgreSQL because of the hassle of configuring automatic backups, changes in deployment procedure etc.
- Document the dependencies of AMY (pretty normal I think)
It's already done in requirements.txt
.
- Document other things such as SSL key installation and the like that AMY depends on
I've started a discussion what exactly we need to document in #1171.
@pbanaszkiewicz
:+1: for Dockerfile, because this is a pretty lightweight solution for running PostgreSQL for development.
Installing PostgreSQL in development is as easy as runningpip install psycopg2
plus one command line to configure PostgreSQL (sorry, I don't remember now what exactly the command is).
With the switch to PostgreSQL we have to rethink backups (is it DB only? Or maybe the whole server, including e.g. generated certificates).
As long as all data lives in database, it's enough to backup only db. And we decided that certificates will be generated on the fly and cached on a filesystem, not stored there.
@maneesha
Switching to Postgres also means rethinking off-line access. Right now it's easy to scp a sqlite file -- how will we access the Postgres db?
Django has dumpdata
and loaddata
commands that are backend agnostic. For example, you can run dumpdata
on production with PostgreSQL and run loaddata
on your local machine with SQLite3.
The latest update on this issue: we continue to run on Rackspace, on a VPS, but we need to move forward to newer Ubuntu.
I'm going to start a new issue for migration to a new VPS server.
Right now, we're using VPS on Rackspace on production. The VPS was configured from scratch by @pbanaszkiewicz. This has a lot of disadvantages:
I'd like to research alternatives. Important factors are:
@pbanaszkiewicz what is our current approach to making backups? Where do they live? What does trigger making regular backups (cron job?)? How do we actually backup (just copy .sqlite3 file?)?