DenverCoder1 / unedit-for-reddit

Creates a link next to edited and deleted Reddit comments to show the original from before it was edited. The unedited comment gets displayed inline.
https://denvercoder1.github.io/unedit-for-reddit
MIT License
70 stars 4 forks source link

PullPush - replacement service for PushShift data #106

Open pullpush-io opened 1 year ago

pullpush-io commented 1 year ago

Hello!

I created a replacement service for PushShift functionality that's now restricted. See https://pullpush.io/ for details. Overall it will aim to be compatible to Dec-2022 version of PushShift API.

In case you feel burned out by the whole 3rd party tool affair; I would like to ask for permission to fork and rebrand your project :)

PullPush Actual

DenverCoder1 commented 1 year ago

Great to hear! I've been in the Discord server since it was first announced. I've been trying out integrating with other sources but this is the only one I've seen that actually seems to be attempting to host all subreddits.

I've created a branch for it here - https://github.com/DenverCoder1/unedit-for-reddit/tree/pullpush-source

It doesn't seem like the API is fully working yet.

I tested with a random comment here. It's coming back with a CORS error when fetching from Reddit's domain and when opening the actual URL itself, it just shows some comments from r/Jokes instead of the comment with the given ID.

Access to fetch at 'https://api.pullpush.io/reddit/search/comment/?ids=t1_himv9l3&fields=body,author,id,link_id,created_utc,permalink'
from origin 'https://www.reddit.com' has been blocked by CORS policy:
Response to preflight request doesn't pass access control check:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
If an opaque response serves your needs, set the request's mode to 'no-cors'
to fetch the resource with CORS disabled.
pullpush-io commented 1 year ago

ids is not implemented yet.

This is because if is a completely different query structure to the rest of the bunch, I will actually do it today now what I have an index that will (hopefully; testing on 5% of dataset doesn't always extrapolate to the whole and building an index for the real thing takes 24-48 hours) solve the performance issues for large queries with conditionals such as author= or subreddt=

fields will not be implemented most likely, right now I consider CPU to be a bigger bottleneck than bandwidth, so it isn't a good trade for me

pullpush-io commented 1 year ago

So implementation of ?ids= is done. Your particular example t1_himv9l3 (subreddit: pushshift) will need to wait as for privacy reasons we are testing in /r/jokes only (this is to give people a chance to submit PII removals, while having a long-standing and keyword rich subreddit to test on).

Limit is implemented on the back-end as subreddit=jokes that gets added to your query.

Examples: https://api.pullpush.io/reddit/search/submission/?q=microsoft [ Find jokes about Microsoft ] https://api.pullpush.io/reddit/search/submission/?ids=t3_ttk4ho,u4qa7i,t3_v3jr5m [ Pull up 3 different submission ids ]