alexames / DeltaBot

GNU General Public License v3.0
65 stars 18 forks source link

Comment history is skipping comments #47

Closed PixelOrange closed 1 month ago

PixelOrange commented 10 years ago

DeltaBot is consistently missing deltas. I believe this to be a result of the comment history feature. I have temporarily added a feature that will automatically clear the history and perform a full scan. Once a better history has been devised, we will need to remove the auto-clear. It can be found in the go(self): function.

chrisuehlinger commented 10 years ago

First off, I'd rename the before attribute to something more informative like comment_history or scanned_comments and rename vars like before_id similarly. With the current name it took me a while to figure out exactly what it was.

Next, we need to figure out why this is happening. Here are some questions:

PixelOrange commented 10 years ago

Go ahead and name it whatever you want. You can blame alexames for the poor variable names there :D haha.

  1. Some missed deltas don't register because of encoding but I haven't seen that specific error in a while. Ones that are edited in are never going to work correctly, so that's not an issue.
  2. It's always happened, but it's been happening with increasing frequency as we get more and more posters. The way that reddit stores comments is weird. The variable that stores them seems to shift somehow. I can't explain it but the easiest way to replicate it is to let the bot run so that it gets a variable in the prev_id.txt and then disable the bot until >1000 comments have been posted. When you turn it back on, it will get confused because it won't be able to tell if the last ID is more recent than the new comments.
chrisuehlinger commented 10 years ago

Hmm, I'm trying to wrap my head around this. But first, how many comments does CMV get per day? and how long does it usually take to get to 1000 comments?

PixelOrange commented 10 years ago

I'm not entirely sure how many comments per day. It doesn't appear to track that on the traffic page. We do get 60,000 pageviews and 20,000 uniques per day. If you go by the 1/10/90 rule, that means we're seeing maybe 2,000 comments in 24 hours? It's probably not that high though.

chrisuehlinger commented 10 years ago

Let's go with 2,000 to be cautious. That means that the only way the comment queue should go over 1,000 would be if DeltaBot is down for 12+ hours. Does DeltaBot ever have long periods of downtime?

For that matter, where is DeltaBot running and how often does someone check on it?

PixelOrange commented 10 years ago

DeltaBot runs on my PC. I check on it any time I'm on my PC or whenever I get modmail that something is freaking out.

It's been down for large gaps recently but only because there were a few bugs. I ironed them out and it's been stable since then. Argument mismatch stuff.

Other than this week, it's been super stable (I don't think there has been a crash or any downtime in the 2 months prior) but it was still missing deltas. We would then send an "add" command and it would pick it up just fine.

chrisuehlinger commented 10 years ago

Ah, the argument mismatches were due to things I did. I still need to write tests for about half of the code. From now on I'll keep my changes in a different branch until they've been tested.

One last thing, do you ever find that you need to use the "force add" command? Any time someone needs to do that, they should submit a detailed issue, since that means something's wrong with the scan_comments code. The better the details, the better our tests will be and the more bugs we'll catch.

PixelOrange commented 10 years ago

force add is usually only needed when someone fails at posting deltas. I don't think we've needed to use it in a while for anything other than that.

chrisuehlinger commented 10 years ago

Alright cool.

Let's keep this issue open until we figure out what the real problem is.

PixelOrange commented 10 years ago

Update for #58 - Fairly certain this is still an issue, but I haven't been able to find a better way to handle it.

The issue, as far as I can tell, is that the way reddit handles their comments is with a short string of numbers and letters. Inside the same submission, 2b2b2b would be stored later than 1a1a1a but in two separate submissions, that's not necessarily the case. submission 3c3c3c could have a 2b2b2b and submission 4d4d4d would have 1a1a1a. The submission is later, but the comment is "earlier" so the bot skips the comment thinking it's old.