ecprice / newsdiffs

Automatic scraper that tracks changes in news articles over time.
Other
494 stars 135 forks source link

DatabaseError: table Articles has no column named git_dir #23

Closed ndarville closed 10 years ago

ndarville commented 10 years ago

I’ve followed the guide (to my knowledge), but I get this error:

$ python website/manage.py scraper
DatabaseError: table Articles has no column named git_dir

Here is what syncdb returns:

$ python website/manage.py syncdb
Syncing...
Creating tables ...
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)

Synced:
 > django.contrib.contenttypes
 > django.contrib.sessions
 > django.contrib.sites
 > south

Not synced (use migrations):
 - frontend
$ python website/manage.py migrate frontend
Running migrations for frontend:
- Nothing to migrate.
 - Loading initial data for frontend.
Installed 0 object(s) from 0 fixture(s)

I’ve made sure $PYTHONPATH and $DJANGO_SETTINGS_MODULE now work, same error.

Any ideas?

ecprice commented 10 years ago

Oops, forgot to commit the south migration file. Does it work now?

On Wed, Jun 11, 2014 at 11:25 AM, ndarville notifications@github.com wrote:

I’ve followed the guide (to my knowledge), but I get this error:

$ python website/manage.py scraper DatabaseError: table Articles has no column named git_dir

Here is what syncdb returns:

$ python website/manage.py syncdb Syncing... Creating tables ... Installing custom SQL ... Installing indexes ... Installed 0 object(s) from 0 fixture(s)

Synced:

django.contrib.contenttypes django.contrib.sessions django.contrib.sites south

Not synced (use migrations):

  • frontend

$ python website/manage.py migrate frontend Running migrations for frontend:

  • Nothing to migrate.
    • Loading initial data for frontend. Installed 0 object(s) from 0 fixture(s)

I’ve made sure $PYTHONPATH and $DJANGO_SETTINGS_MODULE now work, same error.

Any ideas?

— Reply to this email directly or view it on GitHub https://github.com/ecprice/newsdiffs/issues/23.

ndarville commented 10 years ago

I ran the command, and it migrated the schema:

$ website/manage.py schemamigration frontend --auto
 + Added field git_dir on frontend.Article
Created 0002_auto__add_field_article_git_dir.py. You can now apply this migration with: ./manage.py migrate frontend

$ website/manage.py migrate frontend
Running migrations for frontend:
 - Migrating forwards to 0002_auto__add_field_article_git_dir.
 > frontend:0002_auto__add_field_article_git_dir
 - Loading initial data for frontend.
Installed 0 object(s) from 0 fixture(s)

When I run scraper, it keeps loading for several minutes; it might be working, but I had to turn it off after ten minutes. I cloned a fresh, updated repo, but it seems to do the same thing. It could be that my terminal connection is just really slow for some reason.

Could I get try doing a fresh installation locally, follow the steps, and see what happens on your end?

FWIW, here is the prompt when running the initial command on the fresh repo:

$ python website/manage.py syncdb && python website/manage.py migrate
Syncing...
Creating tables ...
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table south_migrationhistory
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)

Synced:
 > django.contrib.contenttypes
 > django.contrib.sessions
 > django.contrib.sites
 > south

Not synced (use migrations):
 - frontend
(use ./manage.py migrate to migrate these)
Running migrations for frontend:
 - Migrating forwards to 0002_auto__add_field_article_git_dir.
 > frontend:0001_initial
 > frontend:0002_auto__add_field_article_git_dir
 - Loading initial data for frontend.
Installed 0 object(s) from 0 fixture(s)

(Of course, I also created a virtualenv to install the packages, outside the steps of the guide.)

ndarville commented 10 years ago

Figured it out; Dropbox was running at the same time, and somehow blocked all outbound traffic.

Thanks for the help.

ndarville commented 10 years ago

Hmm, the problem has returned. Gonna see if I can find out what the heck is causing it.

ndarville commented 10 years ago

Tested bandwidth speed with wget -O /dev/null http://speedtest.wdc01.softlayer.com/downloads/test10.zip.

Had 300~3,500 kB/s, so that can’t be it.

When another terminal tab is in focus, it looks like the scraper is running:

screen shot 2014-06-12 at 14 38 13

Oddly enough, uploading the image above timed out the first time. Looks like something is restricting my bandwidth in some select cases. But I can’t really tell without any feedback in the terminal.

Maybe scraper can print a statement, when it starts scraping, so people get feedback suggesting that it works. If I get it to work, I can also make a GIF of the process.

ecprice commented 10 years ago

The scraper generates log files in /tmp/newsdiffs* . What does it say there? On Jun 12, 2014 8:47 AM, "ndarville" notifications@github.com wrote:

Tested bandwidth speed with wget -O /dev/null http://speedtest.wdc01.softlayer.com/downloads/test10.zip.

Had 300~3,500 kB/s, so that can’t be it.

When another terminal tab is in focus, it looks like the scraper is running:

[image: screen shot 2014-06-12 at 14 38 13] https://cloud.githubusercontent.com/assets/198682/3257668/8df32534-f22f-11e3-94d2-bca245a0afa4.png

Oddly enough, uploading the image above timed out the first time. Looks like something is restricting my bandwidth in some select cases. But I can’t really tell without any feedback in the terminal.

Maybe scraper can print a statement, when it starts scraping, so people get feedback suggesting that it works. If I get it to work, I can also make a GIF of the process.

— Reply to this email directly or view it on GitHub https://github.com/ecprice/newsdiffs/issues/23#issuecomment-45886584.

ndarville commented 10 years ago

Hmm, can’t find the folder. Where exactly is it supposed to be?

Nevermind, I'm an idiot; uploading the logs now.

ndarville commented 10 years ago

Here is the log after running scraper for five minutes: http://dpaste.com/1YHX7D7.

And after one hour: https://gist.github.com/ndarville/932173b871551d3938b7.

After one hour and three minutes, the scraping finally finished: https://gist.github.com/ndarville/ac0edaacea75f215ead7.

So I guess it’s working after all—it just took some time and didn’t let me know how it was doing.

And I think I need a production server with some snazzy bandwidth; checking each hour isn’t going to cut it here. :)

Maybe the local environment can run only one scraper, so people get to see it working, and in a short amount of time?

ecprice commented 10 years ago

That looks good, it's working. You can see the downloaded articles in the articles/ directory.

The "WOO" lines display the current progress (190/1398) through the articles.

Eric On Jun 12, 2014 9:59 AM, "ndarville" notifications@github.com wrote:

Here is the log after running scraper for five minutes: http://dpaste.com/1YHX7D7.

— Reply to this email directly or view it on GitHub https://github.com/ecprice/newsdiffs/issues/23#issuecomment-45894469.