Earlopain / FoxTrove

E6 Upload Helper
GNU General Public License v3.0
6 stars 3 forks source link

Twitter API scraping promoted tweets? #84

Closed faucetlol closed 1 year ago

faucetlol commented 1 year ago

Not sure how worth this is even looking into considering all the uncertainty with what Elon is even doing with the API, but making a note of it here just in case.

When scraping https://twitter.com/onomari_art I somehow ended up downloading https://twitter.com/ourpaydayHQ/status/1626053122534649857 in the process, which I can only assume is some sort of promoted tweet.

I imagine this happens more often than I realise, and I've just been hiding the content along with all the other memes and photos without looking whether it was posted from their account or not.

faucetlol commented 1 year ago

Almost forgot that I can actually see the scraper logs, it's definitely a promoted tweet.

response.txt

Wonder if this is actually doing anything at all, or if somehow some just manage to slip through anyway? https://github.com/Earlopain/reverser/blob/e7da0dcda6da8cb54af02990220be3ce04b25540/app/logical/scraper/twitter.rb#L26

Earlopain commented 1 year ago

It should be relatively easy to filter these out, the twitter frontend has to mark them as such after all. snscrape is always a good resource for this, they have it figured out for years already. https://github.com/JustAnotherArchivist/snscrape/commit/966a6ebd8eab3b6b7f435544e7f92dd385cb3859

I'm going to try and add a test for this. I'm fairly sure that most of the parameters being passed along to the endpoints don't actually do anything. I just copied what the twitter frontend did at the time.

Earlopain commented 1 year ago

That should do it. I looked at my logs and didn't find a single occurence of this so I had to base it of the file you gave me.

faucetlol commented 1 year ago

Thanks for taking the time to look into that even though it had never affected you before 😅