SaxyPandaBear / food-pics

Scheduled webhook that scrapes Reddit and posts to Discord
MIT License
0 stars 0 forks source link

Redo Redis storage do unpack records #14

Closed SaxyPandaBear closed 2 years ago

SaxyPandaBear commented 2 years ago

Problem

In my naive earlier attempts to perform deduplication, I stored reddit posts grouped under the author's username as a key. This will eventually lead to an issue where a record's size grows large enough to evict a large portion of stored submissions in Redis. There's no good way to query or scan for records in Redis by value, so it doesn't make a lot of sense to use submission ID by itself as a key in Redis since that doesn't give enough context to limit a search to remain performant.

Proposed change

Change it so that the key becomes a combination of author + post ID:

user = foo, id = abc123
# becomes
key = foo/abc123

Then the deduplication logic would change from just getting a set and iterating over the members, to using Redis's SCAN command, where it tries to match on the author, like SCAN 0 MATCH foo/*

SaxyPandaBear commented 2 years ago

Closed by d5ba70c1be9d41010420eb2b27d1580b8d72f6eb