csu / export-saved-reddit

Export saved Reddit posts into a HTML file for import into Google Chrome.
Other
433 stars 35 forks source link

Comments not saved, only links #50

Open samschr opened 6 years ago

samschr commented 6 years ago

The current version only saves link URLs and their titles, but saved comments are lost. However, I was able to dive into the code and fix it for my use. Here are the changes I made in order to also grab author, submission body, and a more readable time. I'm not sure if this is the right place to put this as I'm pretty new to github and coding in general, but the git diff is below:

diff --git a/export_saved.py b/export_saved.py
index 7e4f5db..0e88d5f 100755
--- a/export_saved.py
+++ b/export_saved.py
@@ -11,6 +11,7 @@ import argparse
 import csv
 import logging
 import sys
+import datetime

 import praw

@@ -212,19 +213,33 @@ def get_csv_rows(reddit, seq):
             created = int(i.created)
         except ValueError:
             created = 0
+
+        createdreadable = datetime.datetime.fromtimestamp(int(created)).strftime('%Y-%m-%d %H:%M:%S')

         try:
             folder = str(i.subreddit).encode('utf-8').decode('utf-8')
         except AttributeError:
             folder = "None"
+
+        try:
+            body = "N/A"
+            body = str(i.body).encode('utf-8').decode('utf-8')
+        except AttributeError:
+            body = "N/A"

+        try:
+            author = "N/A"
+            author = str(i.author).encode('utf-8').decode('utf-8')
+        except AttributeError:
+            author = "N/A"
+
         if callable(i.permalink):
             permalink = i.permalink()
         else:
             permalink = i.permalink
         permalink = permalink.encode('utf-8').decode('utf-8')

-        csv_rows.append([reddit_url + permalink, title, created, None, folder])
+        csv_rows.append([reddit_url + permalink, title, created, createdreadable, body, author, None, folder])

     return csv_rows

@@ -239,7 +254,7 @@ def write_csv(csv_rows, file_name=None):
     file_name = file_name if file_name is not None else 'export-saved.csv'

     # csv setting
-    csv_fields = ['URL', 'Title', 'Created', 'Selection', 'Folder']
+    csv_fields = ['URL', 'Submission Title', 'Created-UNIX', 'Created-Standard', 'Body', 'Username', 'Selection', 'Folder']
     delimiter = ','

     # write csv using csv module
goose-ws commented 6 years ago

Is this patch python3 compatible? I keep getting an error: UnboundLocalError: local variable 'createdreadable' referenced before assignment

rachmadaniHaryono commented 6 years ago

can you post complete eror @goose-ws ?

goose-ws commented 6 years ago

The problem was that I had createdreadable = datetime.datetime.fromtimestamp(int(created)).strftime('%Y-%m-%d %H:%M:%S') indented one too many times.

Jacedeuce commented 6 years ago

I wasn't able to get this to work. When I run python export_saved.py after making the changes, the process just hangs in my bash until I keyboard interrupt.

spacerainbow000 commented 5 years ago

I added a quick bash script in #52 that will parse this info out of the resultant HTML file after the python script completes, in case anyone else wants a more comprehensive backup and doesn't mind a less elegant solution/somewhat sloppily formatted data

schmendrik commented 4 years ago

I wasn't able to get this to work. When I run python export_saved.py after making the changes, the process just hangs in my bash until I keyboard interrupt.

With those changes made by samschr, it just takes a lot longer to extract the data. If you set the logging level to INFO, you'll see that the program is churning out the data sets.