fake-name / xA-Scraper

69 stars 8 forks source link

Recomment 'if loopCtr' block #67

Closed God-damnit-all closed 4 years ago

God-damnit-all commented 4 years ago

Currently these two lines only lead to commented code, so it expects loopCtr += 1 to be part of the for loop, causing an indentation error.

fake-name commented 4 years ago

I've got this fixed upstream (how did I commit that!), but I haven't pushed because I haven't finished the twitter stuff (this weekend, hopefully. Assuming I don't get distracted with 3d printer crap again).

Let me see if I can pull out the relevant changes from my local repo.

God-damnit-all commented 4 years ago

I'm really looking forward to the twitter stuff, right now it is the biggest pain in the ass to keep track of.

fake-name commented 4 years ago

I think I'm going to have something that grabs the web-accessible stuff first (read: I've written bits of it). The auth-ed stuff can come later. If that runs regularly, it shouldn't miss things.

God-damnit-all commented 4 years ago

I was trying to tell you before, all of it is easily web accessible if you used the advanced search filters to grab a user's tweets by quarter (3 months at a time), going back to the start of 2014. (You'll want to have the filters overlap their dates a bit because twitter isn't very exact about how it dates things.)

The only problem is that you have to be logged into an account that is configured to always view adult content for it to work correctly. But no API use is required.

God-damnit-all commented 4 years ago

Full list of advanced search filters is here: https://github.com/igorbrigadir/twitter-advanced-search

Actually, it might be even better to use max_id and since_id instead, I just now learned about their existence. That would be much more accurate.

God-damnit-all commented 4 years ago

Sorry for the spam but here, read this section. It even has an example on how to scrape using snowflake IDs using Python: https://github.com/igorbrigadir/twitter-advanced-search#snowflake-ids

Once again, the only issue is that the search only shows you adult content on an account configured not to filter it, but no phone number has to be attached to the account or anything.