Closed redshiftzero closed 7 years ago
My goal wasn't to remove the configuration section, but to get the provisioning sorted out so we can get the crawlers working again. Will this PR get the crawlers running and dumping data into the right database the way the VPSes are currently being provisioned? If yes let's get this merged so we can continue crawling. Both of the above assumptions are true in the environments we are running these things: vagrant dev VM, Travis, and on the VPSes, no?
Although these assumptions may be correct in the present moment, this requires that we don't make any changes that break those assumptions
I think there are easier ways to get things re-running by EOD (as that is our expressed goal) that rely on less assumptions. Namely, I still believe we should hard-code the test database credentials in TestDatabase
and leave management of config.ini
production database credentials and PGPASSFILE
to the user after the initial configuration. Should take 5 minutes to write and run a custom tasklist to set the correct production values in the config.ini
files of the VPSs. Should not need to be modified again. Making the permanent changes to our playbook ensure that these values are set correctly on first provision for Travis, Vagrant, and if/when we spin up new instances.
What do you think @conorsch?
Applied this feature branch to the prod crawlers. @redshiftzero Can you confirm working sorting and crawling? If so, I vote we merge.
Namely, I still believe we should hard-code the test database credentials in
TestDatabase
and leave management ofconfig.ini
production database credentials andPGPASSFILE
to the user after the initial configuration.
Open a separate issue for that, since this PR includes the test regression of snipping out the db tests to keep the merge-wheels turnin'.
Should take 5 minutes to write and run a custom tasklist to set the correct production values in the
config.ini
files of the VPSs.
Sounds to me like the config.ini
file should be converted to full template, so we can set sane defaults (e.g. for use in Travis), and override via vars in any other environment (i.e. prod).
Sorter is working on the VM and on the VPSes (just finished another sort), database connections are working now everywhere, but I'm running into tor issues on crawling on the VPSes. I say :+1: to merge this since nothing in this PR is changing anything in the crawler to do with Tor (i.e. these issues are in master)
Thanks for confirming functionality, @redshiftzero! Going to merge as-is.
I'm running into tor issues on crawling on the VPSes.
If you have a traceback, toss into a separate issue!
yep will construct a coherent description of what is going on and make an issue, thanks @conorsch
Fix a couple of issues with the sorter and crawler:
\n
but the delimiter the sorter was expecting is,
so it was no longer parsing any directory URLs.config.ini
was not being populated with the prod db values since Ansible was not setting upconfig.ini
. Made a minor change to read from~/.pgpass
(since Ansible is setting that up on the dev VM and the VPSes). This also closes #93 and closes #94