bisguzar / twitter-scraper

Scrape the Twitter Frontend API without authentication.
MIT License
3.91k stars 599 forks source link

JSONDecodeError + 400 error when using get_tweets() #140

Open theshoals opened 4 years ago

theshoals commented 4 years ago

Code used:

import twitter_scraper
for tweet in twitter_scraper.get_tweets('twitter', pages=1):
    print(tweet)

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/foobar/code/temp/twitter-scraper/twitter_scraper/modules/tweets.py", line 166, in get_tweets
    yield from gen_tweets(pages)
  File "/home/foobar/code/temp/twitter-scraper/twitter_scraper/modules/tweets.py", line 37, in gen_tweets
    html=r.json()["items_html"], url="bunk", default_encoding="utf-8"
  File "/home/foobar/code/temp/twitter-scraper/.venv/lib/python3.6/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The response from Twitter:

url: https://twitter.com/i/profiles/show/twitter/timeline/tweets?include_available_features=1&include_entities=1&include_new_items_bar=true
status_code: 400
text: ''

Perhaps this is related to the redesign mentioned in https://github.com/bisguzar/twitter-scraper/issues/132?

brachna commented 4 years ago

same issue, returns null html

icmpnorequest commented 4 years ago

Same issue, returns 400

peddrinn commented 4 years ago

Same Issue here

adulau commented 4 years ago

Same issues too get_trends() and get_tweets()

brachna commented 4 years ago

https://twitter.com/i/search/timeline doesn't work either

bisguzar commented 4 years ago

Twitter just updated something. We need debug it entirely. But I don't have any time in mey short-term. All informations are welcome

EssbieWGT commented 4 years ago

My hunch is this is the same issue raised in #132. Would updating the "headers" called in tweets.py solve the issue?

bisguzar commented 4 years ago

I don't think so, just modified headers a bit but nothing changed. As I said, I'm not able to debug this issue in short-term because of my busy schedule. Please change headers as you wish too, modify source and tell us what happened. @EssbieWGT

GivenToFlyCoder commented 4 years ago

Maybe is for this

"It seems that Twitter has had it enough! The company is shutting down its original site legacy theme version on the 1st of June 2020, as reported by BleepingComputer. Twitter has issued a warning to all the users who have been using user-agent switching hacks and unsupported browsers to enable the legacy theme."

https://www.digitalinformationworld.com/2020/05/twitter-issues-warning-to-shut-the-site-s-legacy-theme-once-and-for-all-in-june-2020.html#:~:text=It%20seems%20that%20Twitter%20has,to%20enable%20the%20legacy%20theme.

brachna commented 4 years ago

started working now

d3athrow commented 4 years ago

doesn't work for me

bisguzar commented 4 years ago

I'm so confused. Just tried with version 0.4.1 and it seems working. Don't know how yet. But need more information. Look like twitter trying something new. Just tried for get_tweets() by the way. Didn't see any problem on profile and get_trends.

EssbieWGT commented 4 years ago

Spent some time trying to find the problem over the weekend, and couldn't nail it down. Ended up creating a new virtual environment for my script and now everything works fine.

bisguzar commented 4 years ago

So weird, thanks for your efforts @EssbieWGT . I tried inside my old environment and same result, working... I'm not going to close this issue for a while. We need to deep-into search.

brachna commented 4 years ago

Is problem back or is it just me?

d3athrow commented 4 years ago

Back to 400 bad request.

TheMulti0 commented 4 years ago

Does not work

skywind0218 commented 4 years ago

NOT work too

brachna commented 4 years ago

Kinda lost at what can be done here.

Browsing through devtools in Firefox only this brings attention, since it returns json with tweets: https://api.twitter.com/2/timeline/profile/25073877.json But i can't seem to use it inside Python script, access is forbidden.

Another way is to use Pyppeteer with

text = await page.evaluate('''() => {
    return document.all[0].outerHTML;
}''')

But that would be html code with (encrypted?) class names that's a pain in the ass to sort out.

Any ideas?

d3athrow commented 4 years ago

@brachna That link doesn't even work in browser for me

TheMulti0 commented 4 years ago

Seems like it has to do with a change in the Twitter API (v2), I can see that right now you cannot view tweets without logging in.

brachna commented 4 years ago

Ok, so gallery-dl has twitter extractor that uses twitter api (v2). It does work. However, it has rate-limit. Also one of the tweets I used for testing didn't have its media returned, even though it can be viewed in a browser.

kdipippo commented 3 years ago

Confirming that this error is still an issue, same code and traceback as OP comment.

LucasGobatto commented 3 years ago

Any updates about this issue?