Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.29k stars 211 forks source link

1000+ post #180

Closed tigerfullx14 closed 3 years ago

tigerfullx14 commented 3 years ago

Hey. How to go beyond 1000 subreddit posts limit in bulk-downloader-for-reddit?

aliparlakci commented 3 years ago

Unfortunately not :(

Iron-E commented 3 years ago

I have two ideas, feel free to comment and/or criticize:

  1. Does the Reddit API support paging like the YouTube API? If so it might be possible to continue by querying for the next set of values. I do something like that here
  2. The new application ErGoDownloader allows users to pass an -unsave flag in order to remove downloaded posts. This could also allow downloading beyond 1000 saved posts, if the saved posts are removed after downloading successfully.
Serene-Arc commented 3 years ago

There is no paging. There is a hard limit on the Reddit API regarding the limit. You'll find that this applies to your own user lists. If you go to a user's profile and they have 1000+ posts, you will only be able to see the latest 1000. There is no way to bypass this with Reddit itself. There are third-party solutions such as Pushshift but those are not used by the BDFR.

Iron-E commented 3 years ago

Thanks for the response!

If I understand correctly, does reddit not track more than a thousand a time? So if we unsaved 1 it would not retrieve 2–1001?

Edit: I'll test this later and report back

Edit 2: @Serene-Arc I can confirm unsaving a post will allow the program to retrieve more. Therefore if an --unsave is implemented we can get past the 1000 post limit

Serene-Arc commented 3 years ago

I don't know if this feature will come back. All of the BDFR is just scraping data, having actually performing actions and changing data is at odds with the rest of the program, especially if it's just to implement what is honestly a really janky solution.

Another user of the BDFR found a better solution which is to make a GPDR request of your data which will include all of the post IDs that you've upvoted, saved, etc. Then the IDs can be fed into the BDFR to be downloaded.

Iron-E commented 3 years ago

I didn't consider a GPDR request. Made one and downloaded it.

Thanks for the suggestion!

Serene-Arc commented 3 years ago

Glad it worked :)