dmarx / psaw

Python Pushshift.io API Wrapper (for comment/submission search)
BSD 2-Clause "Simplified" License
361 stars 53 forks source link

Search Submissions for titles containing semi-colons #101

Open loving2 opened 3 years ago

loving2 commented 3 years ago

I found a discrepancy between a praw search and a psaw search where psaw failed to include a result that had ';' in the title of the submission.

The PRAW search output one result not included in the PSAW search, see below: KotW 11/3: Chariot Race, City Quarter, Grand Market, Inventor, Mint, Mountebank, Sacrifice, Scepter, Villa, Watchtower. Project: Fleet; Event: Ritual. Colony/Platinum. [Prosperity, Empires, Renaissance]

I suspect this may be due to the semicolon in the title ("... Project: Fleet; Event:...").

I'm using a Windows 10 device with Anaconda3 2021.05 (Python 3.8.8 64-bit).

Please see example code below:


import praw
import pandas as pd
import datetime as dt
from psaw import PushshiftAPI

reddit = praw.Reddit(client_id='[INSERT_CLIENT_ID_HERE]', \
                     client_secret='[INSERT_CLIENT_SECRET_HERE]', \
                     user_agent='[AGENT_NAME]', \
                     username='[REDDIT_USERNAME]', \
                     password='[REDDIT_PASSWORD]')
api = PushshiftAPI(reddit)

expansions = {
    "Dominion": True,
    "Intrigue": False,
    "Seaside": True,
    "Alchemy": False,
    "Prosperity": True,
    "Cornucopia": False,
    "Hinterlands": False,
    "Dark Ages": False,
    "Guilds": False,
    "Adventures": False,
    "Empires": True,
    "Nocturne": False,
    "Renaissance": True,
    "Menagerie": False,
    "All Sets": False,
    "Promo": False
}

print("PRAW Search")
subreddit = reddit.subreddit('dominion')
kotws = subreddit.search(query = "author:avocadro title:KotW", sort = 'new', limit = 1000)
for kotw in kotws:
    title = kotw.title.split("[")
    if (len(title) < 2): continue
    printit = True
    for expanse in expansions.keys():
        if (expanse in title[1] and not expansions.get(expanse)):
            printit = False
            break
    if (printit): 
        print(kotw.title)

print("PSAW Search")
results = list(api.search_submissions(subreddit='dominion', author='avocadro', filter=['author', 'title', 'subreddit'], title='KotW'))
for kotw in results:
    title = kotw.title.split("[")
    if (len(title) < 2): continue
    printit = True
    for expanse in expansions.keys():
        if (expanse in title[1] and not expansions.get(expanse)):
            printit = False
            break
    if (printit): 
        print(kotw.title)