SuperGouge / ChanThreadWatch

Fork of the original discontinued ChanThreadWatch.
90 stars 13 forks source link

Fetching deleted posts from archives. #34

Open usernamestring opened 10 years ago

usernamestring commented 10 years ago

Would it be possible to make CTW optionally check a thread on its respective archive site (like archive.moe or 4plebs.org) and reinsert all previously deleted posts (removed before adding the thread to CTW) in the new saved .html file?

I'd also like to know if it's possible to make CTW keep all the backlinks next to post numbers in the final .html file. For me it's kinda difficult reading a saved thread without all those backlinks highlighting relevant/controversial replies.

SuperGouge commented 10 years ago

If I understand correctly, you would want the archive sites to be used behind to scenes to retrieve deleted posts. However, I'm not sure all archive site work the same and I'm not sure they all (if any) keep deleted posts and/or are updated live, as quickly as the original board. Also, those sites have to be reliable and not provide altered posts or in some cases allow people to add replies to the archived thread. But one of the main problem would be the huge task of maintaining a list of all archive sites for boards on all sites and probably write different parsers for every single one of them. There is MayhemYDG/archives.json for 4chan but we can't know for sure he will continue updating it and that would be another dependency. People who use this list in their application usually just point to the archive site; but in our case we want to be able to parse their archive. As you may know, being able to support a lot of imageboards in CTW is difficult, but this is just humongous (if we want to do it properly).

As for the backlinks, they are handled by the native extension on 4chan. They work just fine locally when watching some sites but as you may have noticed with this extension on local files, some features work and others don't. This is mainly because the extension uses the URL for parsing and for some requests. And obviously this either fails or returns unexpected values on local links. We can maybe hope that moot includes the backlinks directly into the HTML instead of parsing them with the extension or modify the extension to work on local links and stop using the URL. But he will probably do neither of those two because you're "not supposed" to view those threads locally. We could also add them directly into the HTML ourselves but when the extension finally works you may have duplicate backlinks. We could also write our own modified extension but I'm not as qualified for this and that would mean even more maintenance. Basically 4chan is becoming a lot different than other imageboards on which a lot of things work just fine.

I know I'm just throwing ideas around and sorry if I'm rambling. But when it comes to maintaining support for multiple imageboards that each have very specific things to them (especially 4chan), there is no silver bullet unfortunately. I'm doing my best however and if I see the opportunity to improve CTW in the ways requested by users, I will do it.

efotinis commented 7 years ago

Since I was also searching for a way to restore the backlinks in saved threads, here's a Python 3.4+ script to do it (needs BeautifulSoup 4): fixbacklinks.py.

It would probably be faster to inject a script, intercept the loading of the Javascript extension and modify the code to handle non-Web links, but I'll leave that as an exercise for the reader… :b