Closed samuell07 closed 1 year ago
Great job, Twitter: full_text
does not contain the full text...
Thank you for pointing this out! I thought I had tested this, but apparently not or not thoroughly enough.
It looks like there is a new key 'note_tweet' in the json which contains the full text not truncated.
The 'full_text' key contains the text before the 'show more button' displayed in the frontend.
It seems that the "note_tweet" key is not present in the advanced search JSON feed...
... nor does it seem to give any indication whatsoever that there's a longer text. It even explicitly includes truncated: false
.
This might be impossible to fix on searches. The tweet and profile scrapers can be done though.
It is not impossible that twitter will add the note_tweet
key in the advanced search API in the next few days.
They had done this in December by adding the views_count
. They first modified the profiles and tweet APIs before modifying the advanced search 4 weeks later.
Is there any workaround for now to grab note_tweet ?
There is currently no workaround in the advanced search API as the full tweet is simply not returned.
The only way would be to retrieve the tweet id and make a request to the TweetDetail
API to retrieve the full tweet (but that would slow down the scraper significantly).
It is currently only possible to retrieve the note_tweet
by scraping the UserTweets(AndReplies)
API.
Should I wait for an snscrape update, or is this something do-able without waiting for an update ? Guessing that new field need to be added to snscrabe before being usable. Thanks
The SearchTimeline endpoint does return the relevant data, at least if you have the right feature flags enabled.
This is now implemented and seems to work correctly on all scrapers. However, it is a bit difficult to find good examples to test edge cases, such as user mentions in the expanded part, hashtags, or cashtags. I also couldn't find any community with note tweets. If you have examples of these, please let me know, and if you find that anything is still incorrectly or incompletely extracted, please file a bug report.
Thanks for the update.
I can't get to see it working, for example
snscrape --max-results 1 --jsonl twitter-tweet 1656693151694725122 | jq
Doesn't return note_tweet
Tweet url : https://twitter.com/Mediavenir/status/1656693151694725122
Same behavior for twitter-profile
EDIT : Sorry, it seems I have missed that the package can't yet be upgraded with pip install -U snscrape
(2) > snscrape --version
snscrape 0.6.2.20230320
Just upgraded with pip install -U git+https://github.com/JustAnotherArchivist/snscrape.git
and I can confirm that rawContent now contains the whole tweet.
Thanks a lot for your work again.
Describe the bug
Hi, I am trying to scrape data from multiple profiles for some time, so I am not sure whether this is a new issue.
When a user has a blue mark, he can have longer tweets, but only part of the tweet`s text is returned.
For example tweet https://twitter.com/RaviBhalla/status/1643041677232295936 containing
A beautiful day to officially open the new field at the Northwest Resiliency Park with our softball and baseball teams! Couldn’t be prouder to have this available for the spring season, to help us bolster our recreation programs for both children and adults. This is a great moment for Hoboken, as the first athletic field to open in 10 years. I hope everyone enjoys! Looking forward to the grand opening of the rest of the park in June!
returnsA beautiful day to officially open the new field at the Northwest Resiliency Park with our softball and baseball teams! Couldn’t be prouder to have this available for the spring season, to help us bolster our recreation programs for both children and adults. This is a great…
How to reproduce
This code can be used to reproduce, in first part I used the method where I've found the issue, the second part is just to get the specific tweet
Expected behaviour
Should return whole tweet.
Screenshots and recordings
No response
Operating system
Windows 11,
Python version: output of
python3 --version
Python 3.9.12
snscrape version: output of
snscrape --version
0.6.2.20230320
Scraper
TwitterSearchScraper
How are you using snscrape?
CLI (
snscrape ...
as a command, e.g. in a terminal)Backtrace
No response
Log output
No response
Dump of locals
No response
Additional context
No response