Open enzoferey opened 1 year ago
These mentions are technically part of the tweet text. This is exactly what Twitter returns:
...['tweet_results']['result']['legacy']['full_text'] = '@GitHubCopilot @tabnine @Replit @vercel Have you tried them ? Whatβs your opinion ? We read you π'
There is however also a display_text_range
field. That should probably be taken into account for the renderedContent
.
Thanks for pointing it out @JustAnotherArchivist ππ»
I did not realize that all accounts mentioned in a tweet are internally included in its replies (since you get notified about replies it makes sense π).
This might be a good opportunity for me to task as well about the differences of content
, renderedContent
, and rawContent
?
Forget that content
exists; it's a deprecated alias from the early days that will be removed eventually. (It emits a warning if you try to use it.)
rawContent
is the exact tweet text Twitter returns, while renderedContent
is (roughly) the text as it would be rendered on Twitter's web interface. The only difference there currently is the replacement of links, so it doesn't exactly match. For example, replies start with a mention of the replied-to user, which gets rendered separately on the web interface.
Links replacement you mean the https://t.co ones instead of the originals right? Iβm using Puppeteer to navigate those and get the actual URLs.
So as far as I understood, I should be using renderedContent
and there needs to be fix for the fact it should not include mentions on replies. Is this right ?
Describe the bug
Then scrapping the following tweet, the content returned starts like
"@GitHubCopilot @tabnine @Replit @vercel Have you tried them ?"
instead of just"Have you tried them ?"
as expected.How to reproduce
Use the
TwitterTweetScraper
and pass the tweet id1674020720458776576
.Expected behaviour
There should be no non-written mentions at the beginning of the content.
Screenshots and recordings
No response
Operating system
macOS 13.4.1
Python version: output of
python3 --version
3.9
snscrape version: output of
snscrape --version
0.7.0.20230622
Scraper
TwitterTweetScraper
How are you using snscrape?
Module (
import snscrape.modules.something
in Python code)Backtrace
No response
Log output
No response
Dump of locals
No response
Additional context
No response