dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.49k stars 293 forks source link

https://www.spacebattles.com, https://www.sufficientvelocity.com, https://www.questionablequesting.com #2195

Open arromdee opened 12 months ago

arromdee commented 12 months ago

All three sites run similar forum software which is where fiction is posted. In most cases these don't require a login. Questionable questing may require a login for the NSFW forums.

The sites support some text formatting and images.

It should be possible to crawl anything that was posted after chapter indexes were invented. (A few extremely old fics were created before chapter indexes.)

There are types of threadmarks other than chapter indexes such as Informational and Apocrypha. If the main thread is given, it should default to the main threadmarks, which are chapter indexes. It is possible to give an URL which specificially indexes another type of threadmark, in which case it should crawl those instead.

Examples: Main URL (should default to chapters): https://forums.spacebattles.com/threads/a-darker-path-worm-fanfic.1037109/ Threadmark specific URL for chapters: https://forums.spacebattles.com/threads/a-darker-path-worm-fanfic.1037109/threadmarks Threadmark specific URL for sidestory: https://forums.spacebattles.com/threads/a-darker-path-worm-fanfic.1037109/threadmarks?threadmark_category=16

Sidestories and similar posts are sometimes meant to be read along with the main thread, so another useful option may be to put multiple types of threads in one file chronologically by post date. but that may be hard to implement since it would require looking at all posts in a thread.

zGadli commented 12 months ago

the first two websites have Cloudflare protection.