Letractively / fuuka

Automatically exported from code.google.com/p/fuuka
Other
0 stars 0 forks source link

Make post archiving smarter #36

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Posts should only be archived if something about them changed. Currently,
the archive is inserting any post it finds into the database, regardless of
it's new or not. Archiving a new post and updating an already archived post
should have different semantics.

Keeping it this way is bad because:
- PostgreSQL doesn't have REPLACE. We're currently working around this
using a very bad RULE that basically turns all INSERTs into UPDATEs.
- We don't want to mindlessly feed posts to the database, because it causes
things like issue6.
- Reindexing a document in Lucene and other inverted index software is
somewhat expensive (you have to delete the document and reinsert it). I
tried doing it that way and it makes the archiver process use up 100% CPU. 

Currently, a post can only change if:
- The image is deleted (we'll want to ignore this change anyway)
- The post is deleted (already handled as a special case elsewhere, it's
obviously not related to this)
- The post gets a public ban/warning attached to it
- Is edited by a moderator (I'm not sure if they can do this. Even if they
can, we can disregard it, since it never happens)

So basically, besides making the archiver keep track of the latest post
that was archived for each thread (some sort of shared variable hash among
threads, I guess) and then only issue inserts for posts above that number,
you'd only need to issue updates for posts with [banned] tags. I'm not
quite sure on how that part of the archive works, so post if you have a
better idea.

Fixing this will also fix issue6. It might also fix issue19, but I think
that's a different bug.

I'm CC'ing you so you'll get spammed with an email about this. I'm not
comfortable enough with Perl nor the archive's architecture to do a change
like this.

Original issue reported on code.google.com by eksopl on 11 Oct 2009 at 11:34

GoogleCodeExporter commented 8 years ago
The reasons here aren't really valid anymore. ON DUPLICATE KEY UPDATE works 
fine. I could have sworn I had closed this before.

Original comment by eksopl on 17 Jan 2012 at 12:34