DefendPDXCode / DefendPDX-TwitterBot

This is the open-source bot that runs the Twitter Bot on @DefendPDX
MIT License
6 stars 2 forks source link

Deduplicate repeated tweets? #4

Open grinnellian opened 4 years ago

grinnellian commented 4 years ago

I often see cases where multiple people attempt to RT w/ #defendPDX (Heck, I've done it myself before I learned to get better at checking other retweets first) However, slowing down to check for other RTs reduces the utility of the repeater somewhat.

How would you like to handle duplicates and deduplication? Some ideas/cases to think about:

jmlingeman commented 4 years ago

I could see this being implemented as a simple “90% of tokens are the same” or similar, but anything more sophisticated falls into the NLP realm where information duplication detection is a field of ongoing research and probably outside the scope of this bot.