Closed faucetlol closed 1 year ago
Almost forgot that I can actually see the scraper logs, it's definitely a promoted tweet.
Wonder if this is actually doing anything at all, or if somehow some just manage to slip through anyway? https://github.com/Earlopain/reverser/blob/e7da0dcda6da8cb54af02990220be3ce04b25540/app/logical/scraper/twitter.rb#L26
It should be relatively easy to filter these out, the twitter frontend has to mark them as such after all. snscrape is always a good resource for this, they have it figured out for years already. https://github.com/JustAnotherArchivist/snscrape/commit/966a6ebd8eab3b6b7f435544e7f92dd385cb3859
I'm going to try and add a test for this. I'm fairly sure that most of the parameters being passed along to the endpoints don't actually do anything. I just copied what the twitter frontend did at the time.
That should do it. I looked at my logs and didn't find a single occurence of this so I had to base it of the file you gave me.
Thanks for taking the time to look into that even though it had never affected you before 😅
Not sure how worth this is even looking into considering all the uncertainty with what Elon is even doing with the API, but making a note of it here just in case.
When scraping https://twitter.com/onomari_art I somehow ended up downloading https://twitter.com/ourpaydayHQ/status/1626053122534649857 in the process, which I can only assume is some sort of promoted tweet.
I imagine this happens more often than I realise, and I've just been hiding the content along with all the other memes and photos without looking whether it was posted from their account or not.