SuperGouge / ChanThreadWatch

Fork of the original discontinued ChanThreadWatch.
90 stars 13 forks source link

Overwriting .html's with 404 error pages on certain sites. #30

Open usernamestring opened 10 years ago

usernamestring commented 10 years ago

This is an old and somewhat uncommon problem with CTW and some chans. I'm not very knowledgeable about this, but I assume these sites don't send a standard/correct 404 error message to browsers when a thread dies, and as a result CTW keeps downloading these 404 error pages, overwriting the actual .html files that contained the watched threads, and the original contents of these are lost. wizardchan.org is an example of this, if a thread from this site is added to CTW, a 404 page will be downloaded once it dies, and CTW won't display its status as "Page not found" and will keep downloading it periodically unless manually stopped. Can this be fixed?

SuperGouge commented 10 years ago

It would be possible but really ugly and suboptimal since you'd have to write a method for each concerned site and be aware of whenever they change their website's layout around. Also the current SiteHelpers are connection agnostics and breaking this would be really dirty IMO.

The problem is indeed that they send a 200 OK HTTP status code even when the thread is dead. I think the best thing to do would be to contact the webmasters and ask them to conform to the standards.

DerSandmann-Badcode commented 6 years ago

Kind of old, but could this be resolved by doing a check on the returned post count before the HTML is replaced? Not the best solution, but if the post count is 0, it could mean we got a 404 page back with a 200. Instead of just replacing it, we could rename the previous one in some way.