arshadrr / subreddit-archiver

Python utility to archive and keep up-to-date archives of reddit subreddits. Archives to SQLite databases.
GNU General Public License v3.0
28 stars 1 forks source link

Allow to archive a subreddit past a certain date #2

Open ngirard opened 2 years ago

ngirard commented 2 years ago

Heya,

let me start by expressing my gratitude for your efforts into this nice project ! I'm actually surprised I couldn't find any mention of it in Reddit... how about posting it to e.g. r/DataHoarder ?

One of my daily chores is to follow a number of subreddits, say r/rust for instance, and I'm hoping this project will help me to do so more comfortably.

Taking r/rust as an example, this subreddit was created in 2010, while I'm only interested in posts newer than Rust 1.0 (in may 2015).

It would be great if the archive command would allow to specify a starting date, which would help in my situation.

Cheers!

arshadrr commented 2 years ago

Hi,

That sounds like a great feature to implement, I'm interested in doing so. Sadly you've caught me in a bad time, I'm a little busy at present, so I might not be able to implement it soon enough for you to use it. I will give it a go in a week or so.

And thank you for your kind words, this being my first open-source project I really appreciate it :) I have previously posted this on /r/Python for feedback, but decided against posting elsewhere because there are some rough edges I'm unhappy about, that I want to improve on.

If I may be of help in any other way, feel free to ask.

Cheers

ngirard commented 2 years ago

Thanks for the heads up !

Sadly you've caught me in a bad time, I'm a little busy at present, so I might not be able to implement it soon enough for you to use it. I will give it a go in a week or so.

No problem, just ping me here whenever you can devote some time to this project.

Meanwhile, I guess I'll have to wait for the archive to be complete. I'm just halfway to it...admittedly the process is quite slow, the downloading bandwitdh is always lower than 60 KiBps and very often around 2 KiBps. Do you know of any tricks to speed it up ? Alternatively, would you be willing to share your own archives if they happen to be software engineering-related, possibly privately ?

I have previously posted this on /r/Python for feedback

Really ? I'd be curious to take a look to the post. Too bad I missed it.

If I may be of help in any other way, feel free to ask.

Sure. I'm currently hacking a Web UI on top of this first archive of mine using Django / HTMX, and I saw room for improvement in your SQL schema. But there's no rush.