fake-name / xA-Scraper

69 stars 8 forks source link

TwitGet apparently broken too #95

Closed God-damnit-all closed 4 years ago

God-damnit-all commented 4 years ago

I didn't even realize it wasn't functioning, I don't have any tweets for the entire month of February. It's failing at trying to retrieve the join date.

Main.TwitGet.StatusMgr - ERROR - Traceback (most recent call last):
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\twitScrape.py", line 253, in go
Main.TwitGet.StatusMgr - ERROR -     errored |= self.getArtist(aid=aid, artist=name, ctrlNamespace=ctrlNamespace)
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\twitScrape.py", line 203, in getArtist
Main.TwitGet.StatusMgr - ERROR -     for tweet in intf.get_all_tweets(artist, min_date):
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\vendored_twitter_scrape.py", line 281, in get_all_tweets
Main.TwitGet.StatusMgr - ERROR -     interval_start = self.get_joined_date(username)
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\vendored_twitter_scrape.py", line 153, in get_joined_date
Main.TwitGet.StatusMgr - ERROR -     raise exceptions.AccountDisabledException("Could not retreive artist joined date. "
Main.TwitGet.StatusMgr - ERROR - xascraper.modules.exceptions.AccountDisabledException: Could not retreive artist joined date. This usually means the account has been disabled!
Main.TwitGet.StatusMgr - ERROR -

The accounts are definitely not disabled, this occurs for all of them.

fake-name commented 4 years ago

Yep, that be fucked. Let's see...

fake-name commented 4 years ago

Ugh, they moved the joined date to behind a bunch of chained js operations.

God-damnit-all commented 4 years ago

is it easier to access via the old layout? That can still be accessed by forcing the user agent to Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko and creating a rweb_optin cookie with the value side_no_out

fake-name commented 4 years ago

If the old layout is the one without JS, both it and the new layout both are missing the joined date.

What I want is behind a weird graphql query, but I've not been able to fake the right set of operations to make it not error on me.

I'm like, 90% just going to say you have to use headless chrome to make it work.

God-damnit-all commented 4 years ago

If you're going to overhaul the twitter protocol, do you think you could make it start using user ids instead of usernames?

God-damnit-all commented 4 years ago

Here's a weird idea... how about a Twitter account that interfaces with a github.io page under your control that forwards basic API requests for all xA-Scraper users, essentially acting as a proxy?

Not for stuff for retrieving tweets, just for stuff like, user info. Join dates, retrieving usernames by user id, etc.

fake-name commented 4 years ago

That's way, way more work then just using the existing headless chrome stuff I've already written and is even already a dependency.

The fact that the DA scraper already requires headless chrome helps, too.

God-damnit-all commented 4 years ago

Well this seems like a lot more work than just requiring a login, to be fair.

I still hope you consider doing things by user ID, since people changing their username or other people taking over old usernames is a big issue.

fake-name commented 4 years ago

Just using headless chrome is probably the easiest solution in any case. It's just a clumsy solution, and I find it inelegant.

Also, the fact that twitter allows username reuse is so spectacularly idiotic I'm amazed.

God-damnit-all commented 4 years ago

Will be fixed by #96

God-damnit-all commented 4 years ago

I just want to make note that the 'old' method TwitGet used that I patched here is still working fine despite the Twitter layout deprecation. As I suspected, the components that make it work were not gotten rid of. I really think this is far more efficient than headless chrome.

fake-name commented 4 years ago

I thought I had merged those changes. Gah.