AlkTheOrg / reddit-saved-to-csv

Exports saved posts and comments on Reddit to a csv file.
74 stars 5 forks source link

Not all my saved posts were saved. #1

Closed NylaTheWolf closed 2 years ago

NylaTheWolf commented 2 years ago

Hi! So I got this program up and running and I was really excited, but I've noticed that I don't think it saved all of my saved posts.

I found this really cute post that I saved in raindrop.io and according to Reddit, I saved this post. image

However, I literally cannot find it in my csv file? I searched for it using Notepad++'s search by searching its ID, keywords like "fireworks" and "psa" and "animal crossing," and I still couldn't find this particular post. I then imported it to Notion and searched for AnimalCrossing, and it didn't appear there either.

Is it possible that it's not a part of the 1000 saved posts that Reddit shows you? I mean, according to the program and updoot.app I have 949 saved posts, so I don't know. I mean, I did have a sneaking suspicion that it wasn't saving all my saved posts to CSV because there were some subreddits that had a surprisingly low number of saved posts, when I'm almost certain I've saved many more.

Thank you in advance!

AlkTheOrg commented 2 years ago

Hello there @NylaTheWolf !

As you are getting less than 1000 saved posts, you shouldn't have any problems with the maximum limit.

So I tried to reproduce the problem by saving the same post and I changed the limit=None at line 18 to limit=1 to get the latest saved post (None gets the last 1000 posts which is the max limit) and I got the below output with no problem:

ID,Name,Subreddit,Type,URL,NoSFW
1,PSA: Fireworks are good for other things besides drawing dicks!,AnimalCrossing,#Post,https://www.reddit.com/r/AnimalCrossing/comments/i33l1v/psa_fireworks_are_good_for_other_things_besides/,False

Could you please use keywords from my output to find the post in your .csv file.

If you still can't find it please comment or remove everything after line 19 and paste below code to line 19 or any line after 19. Just make sure that it is the only uncommented/undeleted code block after saved_models = reddit.user.me().saved(limit=None) # models: Comment, Submission

saved_model_amount = 0
for model in saved_models:
    saved_model_amount += 1
print('Saved Models Lenght:', saved_model_amount)

Then save and run the script again. If the output number is the same with the post amount of your .csv file's (basically the last post's id) then there is no problem on the script. Last possibility I can think of would be that the api wrapper that I'm using wasn't detecting the post, but as you can see above there is no problem on my output.

NylaTheWolf commented 2 years ago

Haha that was a quick response!

As you are getting less than 1000 saved posts, you shouldn't have any problems with the maximum limit.

Yeah that's what I was thinking.

So I tried to reproduce the problem by saving the same post and I changed the limit=None at line 18 to limit=1 to get the latest saved post (None gets the last 1000 posts which is the max limit) and I got the below output with no problem:

Well, it might be worth mentioning that I must've saved it a year ago. Raindrop.io says the post was saved August 4, 2020.

Could you please use keywords from my output to find the post in your .csv file.

I wasn't quite sure which part of that codeblock was considered a keyword, but I tried looking up the title of the post and the word "AnimalCrossing" (both according to your code)

image

So I made a deleted everything after line 19 like you said and pasted the code. I saved it as a different name and I ran this program and it gave me Saved Models Length: 949 image

The last post's ID is 949 so...that's really bizarre. I mean, it's possible that your last possibility is correct. Or that it's an issue with Reddit's API, maybe. It COULD be a bug with your program but I'm not really an expert at this stuff so I wouldn't know haha ๐Ÿ˜… I'm not trying to sound condescending or rude or something, bugs happen in programs. I'm just throwing it out there as a possibility.

AlkTheOrg commented 2 years ago

Haha that was a quick response!

Yeah. This is the first time someone opens an issue on one of my repositories so was curious about the problem ๐Ÿ˜„


Well, it might be worth mentioning that I must've saved it a year ago. Raindrop.io says the post was saved August 4, 2020.

Well I have posts from 2019, so I don't think it is a problem, but this is the first time I'm hearing Raindrop.io. How does it work in this case? How do you use it to find and save posts? It probably doesn't even matter as the Reddit post looks saved in the screenshot you posted, but knowing may help.


The last post's ID is 949 so...that's really bizarre.

If you don't have any empty lines in your csv file, then there shouldn't be any problems with my code. So, If the post really is not saved into the csv file, then the API client I'm using for some reason is not collecting that Reddit post. If this is the case then my hands are tied ๐Ÿ˜„

Well, although your script is not throwing any exceptions I will also try saving the same post on Windows tomorrow to see some different behavior. There was a bug on Windows in the previous version, but as I said the script would probably crash. I'll give it a try anyways.

Other than that I would like to take a look at your csv output. If you are ok with this you can send me a cloud link of your csv file from Twitter or Reddit. Once we are done I can delete the csv file and you can remove it from cloud.


I'm not trying to sound condescending or rude or something, bugs happen in programs. I'm just throwing it out there as a possibility.

That's the truth of programming, no worries.

NylaTheWolf commented 2 years ago

Well I have posts from 2019, so I don't think it is a problem, but this is the first time I'm hearing Raindrop.io. How does it work in this case? How do you use it to find and save posts? It probably doesn't even matter as the Reddit post looks saved in the screenshot you posted, but knowing may help.

Oh, Raindrop.io is a bookmarking app/bookmark manager. If you ever heard of Pocket, it's a lot like that. I use Pocket too but I like Raindrop.io because it allows me to edit descriptions, titles, and thumbnails (it also got updated with advanced search stuff so that's awesome). It also saves the date things were added to Raindrop, so that's why I brought it up. I use it to bookmark things in general, including reddit posts.

I do have a Mac too, so I could try it there later.

Other than that I would like to take a look at your csv output. If you are ok with this you can send me a cloud link of your csv file from Twitter or Reddit. Once we are done I can delete the csv file and you can remove it from cloud.

I had a feeling that'd be the case haha. I don't think I have anything too scandalous in there though? I'll try to send it through Reddit

AlkTheOrg commented 2 years ago

No problem on Windows as well. Can you detect any other missing posts on your outputs? If you can, please try to unsave and save one of the missing posts again on Reddit (so it will be the latest saved post on Reddit), Then on the original script, change limit=None to limit=1 at line 18 as I did at my first comment. Then execute script to see if you can see it on the .csv file.

NylaTheWolf commented 2 years ago

Can you detect any other missing posts on your outputs?

Looking through some of my older Raindrop saves, there seem to be other missing ones: https://www.reddit.com/r/Fallout/comments/kn1mhg/new_vegas_dlcs_worth_it/ https://www.reddit.com/r/whatstheword/comments/kmthjy/wtp_for_always_thinking_about_what_a_specific/ https://www.reddit.com/r/ACQR/comments/kmicfo/im_releasing_my_new_collection_based_on_famous/

These seem to be saved by Reddit, as I have the option to unsave them, but do not show up in my CSV file, in Notepad++ or in Notion.

Then on the original script, change limit=None to limit=1 at line 18 as I did at my first comment. Then execute script to see if you can see it on the .csv file.

So I unsaved and re-saved the post about the Fallout New Vegas DLC and followed those instructions, and the post showed up in the csv file.

I was curious, though, about how this might affect my other saved posts and if any would get deleted. I edited the second copy of the original script (where we changed the limit) to have Limit=None. I've noticed that I didn't have (this post included in the CSV file)]https://www.reddit.com/r/FurryArtSchool/comments/lg2otu/attempt_at_mecha_what_do_you_think/]. It seems the rest of the other posts are there, especially ones higher up on the list. I also have the same number of saved posts according to reddit-saved-to-csv: 949.

AlkTheOrg commented 2 years ago

I was curious, though, about how this might affect my other saved posts and if any would get deleted. I edited the second copy of the original script (where we changed the limit) to have Limit=None. I've noticed that I didn't have (this post included in the CSV file)]https://www.reddit.com/r/FurryArtSchool/comments/lg2otu/attempt_at_mecha_what_do_you_think/]. It seems the rest of the other posts are there, especially ones higher up on the list. I also have the same number of saved posts according to reddit-saved-to-csv: 949.

Well unfortunately I can't reproduce the problem. Before saving those posts you had linked here (5 in total), the default script (limit=None) gave me 349 posts and after saving those posts it gave me 354. I compared my last 100 saved posts (or comments, but I will just say posts for short) one by one from Reddit's page and there was no missing one. Below is the first 5's output:

ID,Name,Subreddit,Type,URL,NoSFW
1,"attempt at mecha, what do you think?",FurryArtSchool,#Post,https://www.reddit.com/r/FurryArtSchool/comments/lg2otu/attempt_at_mecha_what_do_you_think/,False
2,PSA: Fireworks are good for other things besides drawing dicks!,AnimalCrossing,#Post,https://www.reddit.com/r/AnimalCrossing/comments/i33l1v/psa_fireworks_are_good_for_other_things_besides/,False
3,Iโ€™m releasing my new collection based on famous artworks this Saturday! This first piece was inspired by the Great Wave off Kanagawa ๐ŸŒŠ (aka the dynamic painting!) and a variation of a cardigan I recently bought! I also have a gender neutral version of this outfit available :),ACQR,#Post,https://www.reddit.com/r/ACQR/comments/kmicfo/im_releasing_my_new_collection_based_on_famous/,False
4,WTP for always thinking about what a specific person would say in every situation?,whatstheword,#Post,https://www.reddit.com/r/whatstheword/comments/kmthjy/wtp_for_always_thinking_about_what_a_specific/,False
5,New Vegas DLCs Worth It?,Fallout,#Post,https://www.reddit.com/r/Fallout/comments/kn1mhg/new_vegas_dlcs_worth_it/,False

You are able to find these posts from your saved posts list on Reddit right? I'm starting to wonder if this may be a Reddit bug or not.

NylaTheWolf commented 2 years ago

Sorry for the wait, I didn't even realize I never responded!

I wasn't able to find those posts using this script. However, I recently sent in a request to receive my data package and got the link to download it. I just downloaded it and I went through it, and I'm able to find all of the aforementioned posts in the "saved_posts.csv" file I received from Reddit.

Unfortunately, it doesn't seem like the data package includes the title of the post. I wonder if there's a way to edit your script to make it so I can get the titles and stuff?

It may be a good idea to still contact Reddit about it though.

AlkTheOrg commented 2 years ago

Unfortunately, it doesn't seem like the data package includes the title of the post. I wonder if there's a way to edit your script to make it so I can get the titles and stuff?

It's not up to me how the data is fetched from the Reddit. It's the Reddit API Wrapper (in this case praw which makes it easier to use the Reddit API) that I'm using that gives me the data and then I just process the data.

So just to be clear, in the file that is officially sent you by Reddit, you can see those posts but with incorrect titles?

NylaTheWolf commented 2 years ago

So just to be clear, in the file that is officially sent you by Reddit, you can see those posts but with incorrect titles?

No no no, it only includes the post ID and the url. It doesn't include the full post title for any of the posts.

AlkTheOrg commented 2 years ago

Sorry for late replay. I was busy with work.

Looks like the issue is not related to my code. Although I don't think its an api bug, you may try opening an issue on praw (the Reddit API wrapper that I'm using in the script) in case the devs had faced this issue before.

NylaTheWolf commented 2 years ago

Sorry for the late reply, I was really caught up in school stuff.

I understand! Perhaps I could try opening an issue. Thank you so much for trying to help me out though!