codeforamerica / cfapi

The Code for America API. Tracks and motivates activity and participation across the civic technology movement.
http://codeforamerica.org/api
MIT License
113 stars 52 forks source link

Sweep and Keep pattern broken #139

Closed ondrae closed 9 years ago

ondrae commented 10 years ago

Description

Every hour we update everything in the database. To find anything that should be deleted, we use a Sweep and Keep pattern. It supposed to go like this:

  1. Mark everything for deletion keep = false
  2. As we do our update, when we find things to keep mark them as keep = true
  3. If something is still marked as keep = false at the very end, delete it.

Thats not happening though.

For example, Code for America is some how getting marked as not to keep. It shouldn't be. Because it is though, it should be getting deleted, but its not.

codeforamerica-api::OLIVE=> SELECT name, keep FROM organization ORDER BY name;
              name               | keep
---------------------------------+------
 a2civictech                     | f
 Beta NYC                        | t
 Civic Data Alliance             | t
 Code4PuertoRico                 | t
 Code for Aizu                   | f
 Code for America                | f
...

Same with projects #132

Events too

codeforamerica-api::OLIVE=> SELECT COUNT(*) FROM event WHERE keep = false;
 count
-------
   444

Stories:

SELECT COUNT(*) FROM story WHERE keep = false;
 count
-------
   194

This gets a little complicated by the cascading deletions that should happen. Like when an organization gets delete, we should also delete all of its projects, stories, events, issues, and labels. That should happen automatically through all of the class definitions though. Like:

# Relationships
    events = db.relationship('Event', cascade='save-update, delete')
    stories = db.relationship('Story', cascade='save-update, delete')
    projects = db.relationship('Project', cascade='save-update, delete')

Its obviously not working.

Tasks

tmaybe commented 9 years ago

For reference, here's what was happening with the incorrect values of keep – we're using two different methods of setting the keep value, one of which takes precedence even if it happens earlier. Here's a demonstration:

(Pdb) test_org = db.session.query(Organization).filter(Organization.name == u'C\xf6de for Ameri\xe7a').first()
(Pdb) test_org.keep = True
(Pdb) test_org.keep
True
(Pdb) db.session.execute(db.update(Organization, values={'keep': False}).where(Organization.name == test_org.name))
<sqlalchemy.engine.result.ResultProxy object at 0x10acff7d0>
(Pdb) test_org.keep = True
(Pdb) test_org.keep
True
(Pdb) db.session.commit()
(Pdb) test_org.keep
False
(Pdb) 
migurski commented 9 years ago

:robot: DESTROY SQLALCHEMY :robot:

ondrae commented 9 years ago

Got a weird error on production from the Heroku logs.

Traceback (most recent call last):
  File "run_update.py", line 902, in <module>
    main(org_name=org_name, org_sources=ORG_SOURCES)
  File "run_update.py", line 871, in main
    db.session.query(Event).filter(Event.keep == False).delete()
...
sqlalchemy.exc.DBAPIError: (TransactionRollbackError) deadlock detected
DETAIL:  Process 28314 waits for ShareLock on transaction 950340; blocked by process 28427.
Process 28427 waits for ShareLock on transaction 950339; blocked by process 28314.
HINT:  See server log for query details.
 'DELETE FROM event WHERE event.keep = false' {}