Closed steveharoz closed 4 years ago
@steveharoz It looks like you're not running the yarn start
command in the curate_science
directory. Try reopening the terminal, cd curate_science
and then run yarn start
.
Thanks. I'm still missing something. I got to the point where I can run the server and run yarn start
. But I don't see the app when I check the site:
python manage.py runserver_plus
yarn start
Here's how it looks when I go to the machine's IP address in a browser:
I also tried [IP]:8000 (connection refused) and [IP]/admin (404 error).
Any pointers?
Connection refused sounds to me like you might be trying to access the server Django server from a different machine (or a docker container).
If that's the case, you might want to have Django listen on all interfaces (0.0.0.0:8000) rather than (127.0.0.1:8000)
Or setup a proxy and forward the traffic.
Thank you! Got it working with python manage.py runserver [::]:8000
I've updated the above procedure in case anyone else who's unfamiliar with the django stack needs help getting set up.
One last question: my copy of the site only has 3 articles. Is there a simple way to import the data from the main site? I don't really understand what to do with the DB migration instructions.
sorry i just realized you might still be waiting for help with this. @alexkyllo any chance you have a moment to provide guidance to @steveharoz on how to migrate production DB content into his local copy?
also, i've now linked your more detailed setup instructions within our README file in case it can help others, thanks so much!
I think some of the migrations I wrote do seed the database with some initial data, but perhaps if the functionality of the site has changed much, it may no longer be useful. Try applying the django migrations and see if that's sufficient.
If not, the easiest way to copy all production data into your local database is probably using django-admin dumpdata/loaddata. Run dumpdata on production, pipe the output to a file, and run loaddata on your local, being very careful not to do the converse, which would overwrite the production db with your local data.
Another possible way to log into the production database server, take a backup using pg_dump, and then restore that backup into your local database server. This is similarly dangerous and requires care.
thanks @alexkyllo for your suggestions!
and as side note, i believe the catastrophic actions you describe would not be possible given @steveharoz doesn't have write access to the resources -- only could initiate a PR, right?
Well, if what he needs is a copy of all data in the production database, then he will need shell access to the production server to get it, in which case he could do anything. Or, he needs someone with that access to do it for him and send him the data in a zip file or something.
oh i see, thanks for the follow-up! sounds like safest option is to just provide him with a production DB data dump and then he can import it into his local instance. @steveharoz just let us know if this works with you.
Thanks for following up. Yeah, as @alexkyllo said, there were a couple entries in the initial DB, but they were too few and too simple to decently sample the capability of the site.
A DB dump would work. FYI: I'm doing some end-of-year catching up for the next week or two, so take your time.
@steveharoz Would you prefer a Django (Python application level) data dump, or a PostgreSQL (database level) data dump? It would be the same data, just a different format and method for loading it.
I suppose with PostgreSQL, I could import it instead of "Create a database and set up user:" (see the post in this thread). Then I'd just need to add my own user afterwords.
just thought of an idea that would potentially solve your problem AND benefit our own testing purposes. that is, migrate the entire current production DB into the staging DB so that we have more realistic data to test new changes/features with on our staging website.
of course making sure you don't do the converse (as you mentioned), though i assume we do have a backup of the production DB or some kind of mechanism through GCP to revert back to a previous state?
thoughts @alexkyllo ?
We actually don't have any automated backup process in production yet, but we probably should. It's a little more involved since we're self-hosting the database on a VM (vs. using GCP's managed PostgreSQL as a service, which was expensive), but if you'd like, I can look into setting up a script to take a daily backup of the entire production DB and drop it in a file storage bucket. Then Steve or any other devs could just retrieve the latest backup from that bucket when they want to set up their dev environments. I believe I could also set up a retention policy to auto-delete old backups (at say, 30 days).
I can look into setting up a script to take a daily backup of the entire production DB and drop it in a file storage bucket. Then Steve or any other devs could just retrieve the latest backup from that bucket when they want to set up their dev environments. I believe I could also set up a retention policy to auto-delete old backups (at say, 30 days).
yes, this seems pretty important, and sounds like a great plan. thanks so much for offering to do this (again i can pay you if you keep track of your hours). thanks!
Ok, cool! Starting this Saturday I have vacation from my day job for the rest of the year, so I will start working on this and track my time.
btw, i assume there will be a way to import, from the production DB backup into the staging DB, only the non-user-related content (i.e., actual curated content, rather than user accounts, etc.)?
A partial data import from production into staging is possible but will be a bit more complicated. Because other tables have foreign key relations with the user table, and the same user can have different primary keys in prod and staging, I can't just copy the other tables over directly. It probably needs to go through the python application layer to ensure referential consistency. I don't know what is the easiest way to do this off the top of my head, but can research it. Is this something you want done just once, or on a regular basis?
Also, the production server needs a restart to install some updates--should be less than 5 minutes downtime. Is there any time of the day/week you definitely don't want me to restart it?
I took a backup and placed it in the storage location: https://storage.cloud.google.com/curate_backups/
Let me know who needs to be granted access to this folder, that doesn't have access to the GCP project.
Meantime I'll work on a bash script to do this daily.
I took a backup and placed it in the storage location: https://storage.cloud.google.com/curate_backups/
Let me know who needs to be granted access to this folder, that doesn't have access to the GCP project.
Meantime I'll work on a bash script to do this daily.
great, thanks!
A partial data import from production into staging is possible but will be a bit more complicated. Because other tables have foreign key relations with the user table, and the same user can have different primary keys in prod and staging, I can't just copy the other tables over directly. It probably needs to go through the python application layer to ensure referential consistency. I don't know what is the easiest way to do this off the top of my head, but can research it. Is this something you want done just once, or on a regular basis?
Also, the production server needs a restart to install some updates--should be less than 5 minutes downtime. Is there any time of the day/week you definitely don't want me to restart it?
Oh OK. Actually, I guess it might be fine importing everything (from production DB) into staging DB given that I haven't really invited others into it. And even then, i could just re-invite them as users, right?
re: restarting production server, we don't have that much traffic, but it appears from Google Analytics, that Sunday it the LEAST busy day, so maybe aim for that.
Yes, if I restore the staging database from a backup of the production database, then staging data will be completely overwritten by production data. This means that everyone who had access to production before the restore, will now also have access to staging after the restore, with the same credentials--important to consider if you want to prevent users from (purposefully or accidentally) accessing the staging site. Also any accounts that only existed on staging, would be deleted from staging--if they didn't have a production account, you would need to re-invite them to staging.
The daily backup job is essentially done, I just need to check back tomorrow to make sure the cron job actually ran as scheduled. It took me less than an hour total so I won't bother billing for it.
ok awesome, thanks so much! btw, looks like the build is failing (https://travis-ci.org/ScienceCommons/curate_science/builds/631121500?utm_source=github_status&utm_medium=notification).
and thanks for the clarification re: what will happen to users if restore staging DB from backup of production DB. that should work because i only invited 1-2 users into the staging DB for temporary access, and can always re-invite them if need be. and production users don't know the staging website URL, and even then they would have no motivation to do stuff there, and even so, it would do no harm.
so i guess please go ahead and do that restore procedure, AGAIN being sure you're going the correct direction! lol (though we now have a backup of the production DB i guess ;-p)
oh and could you also add
automatic_scaling:
min_instances: 1
to the staging branch app.yaml file on staging, as was done for production to address the 2-3 second delays in waking up server (https://github.com/ScienceCommons/curate_science/commit/5234b71d708a0f28e196720456c9794155ab198b). i'm willing to take the risk that this may be associated with a cost increase on GCP (if so, we can re-evaluate).
thanks again!
The build failure looks unrelated to anything I did. The error message looks like an intermittent network connectivity issue. I'll restart the build and see if it succeeds.
I very carefully restored a backup of the prod DB into the staging DB and it appears to have worked--seeing the same articles on the Browse page for both instances now. I also backed up staging so that we can undo this if necessary.
It looks like the automatic_scaling
config is already present in app.yaml on the staging branch. You're still experiencing noticeable page load delays on staging only?
ok great, thanks, build succeeded.
re: automatic_scaling, ah ok i didn't realize that was already changed for both. ok i'll do more testing and let you know.
great, yes i can NOW see all of the prod DB stuff on the staging website. everything seems to be working, EXCEPT an issue with key figures. somehow, a subset of the key figure images are yielding 404s, but i haven't been able to figure out what differentiates images that are loading fine vs. those that aren't (the URLs/paths seem fine, e.g.,
https://storage.googleapis.com/curate-science-staging-2.appspot.com/key_figures/lvcc-in-press-figure-1.png (loads fine) https://storage.googleapis.com/curate-science-staging-2.appspot.com/key_figures/lvcc2019-table1.png (doesn't load)
Hmm, my guess is that's because the staging site is still pointing to the staging storage bucket for images (which is good because we don't want images uploaded to staging to pollute production), so we just need to one-time copy all the missing image files from the prod bucket over to the staging bucket.
so we just need to one-time copy all the missing image files from the prod bucket over to the staging bucket.
OK that makes sense (I guess the images that ARE working were already in the staging bucket...).
So please go ahead and do that one-time copy of the image files from the prod bucket to the staging bucket, thanks so much!! (again being careful of going the correct direction!)
Done!
great, everything looks to be working, thanks a million @alexkyllo !
@steveharoz hopefully everything is working at your end now, and please let me know if you're now able to play around w/ article card esthetic/styling tweaks!
Hi @xgui3783, I'm still not sure who you are (but thx for chiming in in this thread), but just wanted to let you know about our new bug bounty program, in case you're interested & have time to make code contributions! See https://github.com/ScienceCommons/curate_science#contributing
@eplebel thanks for the heads up.
EDIT: I've updated these commands based on the advice below. It works with a new Digital Ocean instance of Ubuntu.
Update and upgrade the packages:
Install nginx:
apt-get install -y nginx
Install supervisor:
apt-get install -y supervisor
Install python virtualenv:
apt-get install -y python-virtualenv
Install postgresql:
apt-get install -y postgresql postgresql-contrib
get project:
git clone https://github.com/steveharoz/curate_science.git
make curate_science/.env file:
setup virtual environment
Switch to the postgres user:
sudo su - postgres
Type this to go to the postgres interactive shell:
psql
Create a database and set up user:
Quit from the shell and switch back to the root user:
Set up server:
tools to generate html/js
(reopen terminal)
start the server
Go to
[your machine's ip]:8000