instructions for dev setup

steveharoz commented 5 years ago

EDIT: I've updated these commands based on the advice below. It works with a new Digital Ocean instance of Ubuntu.

Update and upgrade the packages:

apt-get update
apt-get -y upgrade

Install nginx: apt-get install -y nginx

Install supervisor: apt-get install -y supervisor

Install python virtualenv: apt-get install -y python-virtualenv

Install postgresql: apt-get install -y postgresql postgresql-contrib

get project: git clone https://github.com/steveharoz/curate_science.git

make curate_science/.env file:

DB_USER=sharoz
DB_PASS=mypassword
SECRET_KEY=somerandomkey

setup virtual environment

virtualenv curate_science --python=$(which python3.5)
source curate_science/bin/activate
cd curate_science

Switch to the postgres user: sudo su - postgres

Type this to go to the postgres interactive shell: psql

Create a database and set up user:

CREATE DATABASE curate;
CREATE USER [DB_USER] WITH PASSWORD '[DB_PASS]';
GRANT ALL PRIVILEGES ON DATABASE curate TO [DB_USER];
ALTER USER [DB_USER] CREATEDB;

Quit from the shell and switch back to the root user:

postgres=# \q
postgres@djtrump:~$ exit

Set up server:

pip install -r requirements.txt
python manage.py migrate

tools to generate html/js

apt install npm
npm install -g yarn
npm install webpack

(reopen terminal)

source curate_science/bin/activate
cd curate_science
yarn start
[type ctrl + c after it finishes]

start the server

source curate_science/bin/activate
cd curate_science
python manage.py runserver [::]:8000

Go to [your machine's ip]:8000

mickaobrien commented 5 years ago

@steveharoz It looks like you're not running the yarn start command in the curate_science directory. Try reopening the terminal, cd curate_science and then run yarn start.

steveharoz commented 5 years ago

Thanks. I'm still missing something. I got to the point where I can run the server and run yarn start. But I don't see the app when I check the site:

python manage.py runserver_plus

yarn start

Here's how it looks when I go to the machine's IP address in a browser:

I also tried [IP]:8000 (connection refused) and [IP]/admin (404 error).

Any pointers?

xgui3783 commented 5 years ago

Connection refused sounds to me like you might be trying to access the server Django server from a different machine (or a docker container).

If that's the case, you might want to have Django listen on all interfaces (0.0.0.0:8000) rather than (127.0.0.1:8000)

Or setup a proxy and forward the traffic.

steveharoz commented 5 years ago

Thank you! Got it working with python manage.py runserver [::]:8000

I've updated the above procedure in case anyone else who's unfamiliar with the django stack needs help getting set up.

One last question: my copy of the site only has 3 articles. Is there a simple way to import the data from the main site? I don't really understand what to do with the DB migration instructions.

eplebel commented 4 years ago

sorry i just realized you might still be waiting for help with this. @alexkyllo any chance you have a moment to provide guidance to @steveharoz on how to migrate production DB content into his local copy?

also, i've now linked your more detailed setup instructions within our README file in case it can help others, thanks so much!

alexkyllo commented 4 years ago

I think some of the migrations I wrote do seed the database with some initial data, but perhaps if the functionality of the site has changed much, it may no longer be useful. Try applying the django migrations and see if that's sufficient.

If not, the easiest way to copy all production data into your local database is probably using django-admin dumpdata/loaddata. Run dumpdata on production, pipe the output to a file, and run loaddata on your local, being very careful not to do the converse, which would overwrite the production db with your local data.

Another possible way to log into the production database server, take a backup using pg_dump, and then restore that backup into your local database server. This is similarly dangerous and requires care.

eplebel commented 4 years ago

thanks @alexkyllo for your suggestions!

and as side note, i believe the catastrophic actions you describe would not be possible given @steveharoz doesn't have write access to the resources -- only could initiate a PR, right?

alexkyllo commented 4 years ago

Well, if what he needs is a copy of all data in the production database, then he will need shell access to the production server to get it, in which case he could do anything. Or, he needs someone with that access to do it for him and send him the data in a zip file or something.

eplebel commented 4 years ago

oh i see, thanks for the follow-up! sounds like safest option is to just provide him with a production DB data dump and then he can import it into his local instance. @steveharoz just let us know if this works with you.

steveharoz commented 4 years ago

Thanks for following up. Yeah, as @alexkyllo said, there were a couple entries in the initial DB, but they were too few and too simple to decently sample the capability of the site.

A DB dump would work. FYI: I'm doing some end-of-year catching up for the next week or two, so take your time.

alexkyllo commented 4 years ago

@steveharoz Would you prefer a Django (Python application level) data dump, or a PostgreSQL (database level) data dump? It would be the same data, just a different format and method for loading it.

steveharoz commented 4 years ago

I suppose with PostgreSQL, I could import it instead of "Create a database and set up user:" (see the post in this thread). Then I'd just need to add my own user afterwords.

eplebel commented 4 years ago

just thought of an idea that would potentially solve your problem AND benefit our own testing purposes. that is, migrate the entire current production DB into the staging DB so that we have more realistic data to test new changes/features with on our staging website.

of course making sure you don't do the converse (as you mentioned), though i assume we do have a backup of the production DB or some kind of mechanism through GCP to revert back to a previous state?

thoughts @alexkyllo ?

alexkyllo commented 4 years ago

We actually don't have any automated backup process in production yet, but we probably should. It's a little more involved since we're self-hosting the database on a VM (vs. using GCP's managed PostgreSQL as a service, which was expensive), but if you'd like, I can look into setting up a script to take a daily backup of the entire production DB and drop it in a file storage bucket. Then Steve or any other devs could just retrieve the latest backup from that bucket when they want to set up their dev environments. I believe I could also set up a retention policy to auto-delete old backups (at say, 30 days).

eplebel commented 4 years ago

I can look into setting up a script to take a daily backup of the entire production DB and drop it in a file storage bucket. Then Steve or any other devs could just retrieve the latest backup from that bucket when they want to set up their dev environments. I believe I could also set up a retention policy to auto-delete old backups (at say, 30 days).

yes, this seems pretty important, and sounds like a great plan. thanks so much for offering to do this (again i can pay you if you keep track of your hours). thanks!

alexkyllo commented 4 years ago

Ok, cool! Starting this Saturday I have vacation from my day job for the rest of the year, so I will start working on this and track my time.

eplebel commented 4 years ago

btw, i assume there will be a way to import, from the production DB backup into the staging DB, only the non-user-related content (i.e., actual curated content, rather than user accounts, etc.)?

alexkyllo commented 4 years ago

A partial data import from production into staging is possible but will be a bit more complicated. Because other tables have foreign key relations with the user table, and the same user can have different primary keys in prod and staging, I can't just copy the other tables over directly. It probably needs to go through the python application layer to ensure referential consistency. I don't know what is the easiest way to do this off the top of my head, but can research it. Is this something you want done just once, or on a regular basis?

Also, the production server needs a restart to install some updates--should be less than 5 minutes downtime. Is there any time of the day/week you definitely don't want me to restart it?

alexkyllo commented 4 years ago

I took a backup and placed it in the storage location: https://storage.cloud.google.com/curate_backups/

Let me know who needs to be granted access to this folder, that doesn't have access to the GCP project.

Meantime I'll work on a bash script to do this daily.

eplebel commented 4 years ago

I took a backup and placed it in the storage location: https://storage.cloud.google.com/curate_backups/

Let me know who needs to be granted access to this folder, that doesn't have access to the GCP project.

Meantime I'll work on a bash script to do this daily.

great, thanks!

eplebel commented 4 years ago

A partial data import from production into staging is possible but will be a bit more complicated. Because other tables have foreign key relations with the user table, and the same user can have different primary keys in prod and staging, I can't just copy the other tables over directly. It probably needs to go through the python application layer to ensure referential consistency. I don't know what is the easiest way to do this off the top of my head, but can research it. Is this something you want done just once, or on a regular basis?

Also, the production server needs a restart to install some updates--should be less than 5 minutes downtime. Is there any time of the day/week you definitely don't want me to restart it?

Oh OK. Actually, I guess it might be fine importing everything (from production DB) into staging DB given that I haven't really invited others into it. And even then, i could just re-invite them as users, right?

re: restarting production server, we don't have that much traffic, but it appears from Google Analytics, that Sunday it the LEAST busy day, so maybe aim for that.

alexkyllo commented 4 years ago

Yes, if I restore the staging database from a backup of the production database, then staging data will be completely overwritten by production data. This means that everyone who had access to production before the restore, will now also have access to staging after the restore, with the same credentials--important to consider if you want to prevent users from (purposefully or accidentally) accessing the staging site. Also any accounts that only existed on staging, would be deleted from staging--if they didn't have a production account, you would need to re-invite them to staging.

The daily backup job is essentially done, I just need to check back tomorrow to make sure the cron job actually ran as scheduled. It took me less than an hour total so I won't bother billing for it.

eplebel commented 4 years ago

ok awesome, thanks so much! btw, looks like the build is failing (https://travis-ci.org/ScienceCommons/curate_science/builds/631121500?utm_source=github_status&utm_medium=notification).

and thanks for the clarification re: what will happen to users if restore staging DB from backup of production DB. that should work because i only invited 1-2 users into the staging DB for temporary access, and can always re-invite them if need be. and production users don't know the staging website URL, and even then they would have no motivation to do stuff there, and even so, it would do no harm.

so i guess please go ahead and do that restore procedure, AGAIN being sure you're going the correct direction! lol (though we now have a backup of the production DB i guess ;-p)

oh and could you also add

automatic_scaling:
min_instances: 1

to the staging branch app.yaml file on staging, as was done for production to address the 2-3 second delays in waking up server (https://github.com/ScienceCommons/curate_science/commit/5234b71d708a0f28e196720456c9794155ab198b). i'm willing to take the risk that this may be associated with a cost increase on GCP (if so, we can re-evaluate).

thanks again!

alexkyllo commented 4 years ago

The build failure looks unrelated to anything I did. The error message looks like an intermittent network connectivity issue. I'll restart the build and see if it succeeds.

I very carefully restored a backup of the prod DB into the staging DB and it appears to have worked--seeing the same articles on the Browse page for both instances now. I also backed up staging so that we can undo this if necessary.

It looks like the automatic_scaling config is already present in app.yaml on the staging branch. You're still experiencing noticeable page load delays on staging only?

eplebel commented 4 years ago

ok great, thanks, build succeeded.

re: automatic_scaling, ah ok i didn't realize that was already changed for both. ok i'll do more testing and let you know.

great, yes i can NOW see all of the prod DB stuff on the staging website. everything seems to be working, EXCEPT an issue with key figures. somehow, a subset of the key figure images are yielding 404s, but i haven't been able to figure out what differentiates images that are loading fine vs. those that aren't (the URLs/paths seem fine, e.g.,

https://storage.googleapis.com/curate-science-staging-2.appspot.com/key_figures/lvcc-in-press-figure-1.png (loads fine) https://storage.googleapis.com/curate-science-staging-2.appspot.com/key_figures/lvcc2019-table1.png (doesn't load)

Untitled

alexkyllo commented 4 years ago

Hmm, my guess is that's because the staging site is still pointing to the staging storage bucket for images (which is good because we don't want images uploaded to staging to pollute production), so we just need to one-time copy all the missing image files from the prod bucket over to the staging bucket.

eplebel commented 4 years ago

so we just need to one-time copy all the missing image files from the prod bucket over to the staging bucket.

OK that makes sense (I guess the images that ARE working were already in the staging bucket...).

So please go ahead and do that one-time copy of the image files from the prod bucket to the staging bucket, thanks so much!! (again being careful of going the correct direction!)

alexkyllo commented 4 years ago

Done!

eplebel commented 4 years ago

great, everything looks to be working, thanks a million @alexkyllo !

@steveharoz hopefully everything is working at your end now, and please let me know if you're now able to play around w/ article card esthetic/styling tweaks!

eplebel commented 4 years ago

Hi @xgui3783, I'm still not sure who you are (but thx for chiming in in this thread), but just wanted to let you know about our new bug bounty program, in case you're interested & have time to make code contributions! See https://github.com/ScienceCommons/curate_science#contributing

xgui3783 commented 4 years ago

@eplebel thanks for the heads up.

ScienceCommons / curate_science

instructions for dev setup #76