reddit2text
is the Python library designed to effortlessly transform any Reddit thread into clean, readable text data.
Perfect for prompting to an LLM, performing NLP/data analysis, or simply archiving for offline use, reddit2text
offers a straightforward interface to access and convert content from Reddit.
Easy install using pip
pip3 install reddit2text
First, you need to create a Reddit app to get your client_id and client_secret, in order to access the Reddit API.
Here's a visual step-by-step guide I created to do this! Alternatively, you can look at Reddit's API documentation.
Then, replace the client_id
, client_secret
, and user_agent
with your credentials.
The user agent can be anything you like, but we recommend following this convention according to Reddit's guidelines: '<app type>:<app name>:<version> (by <your username>)'
This is enough to get started:
from reddit2text import Reddit2Text
r2t = Reddit2Text(
# replace with your actual creds
client_id='123abc',
client_secret='123abc',
user_agent='script:my_app:v1.0 (by u/reddit2text)'
)
URL = 'https://www.reddit.com/r/AskReddit/comments/1by3p2o/whats_the_stupidest_animal_and_how_has_it/'
output = r2t.textualize_post(URL)
print(output)
Here is an example (truncated) output from the above code! https://pastebin.com/niQTGbys
Optional[str]
:
None
or -1
to include all.Optional[str]
:
|
to mimic reddit.r2t = Reddit2Text(
# credentials ...
max_comment_depth=3, # all comment chains will be limited to a max of 3 replies
comment_delim='#' # each comment level will be preceded by multiples of this string
)
Have a Feature Idea?
Simply open an issue on github and tell me what should be added to the next release!
Contributions to reddit2text are always welcomed! I'm just a person that made something I think is useful, so any help is appreciated. You can always submit a pull requests or add an issue to the GitHub repository.
reddit2text is released under the Apache License 2.0. See the LICENSE file for more details.