twitter is scraping tweet links instead of the actual tweets

JustAnotherArchivist / snscrape

A social networking service scraper in Python

GNU General Public License v3.0

4.31k stars 698 forks source link

twitter is scraping tweet links instead of the actual tweets #91

Closed arceuss closed 3 years ago

arceuss commented 3 years ago

im trying to scrape someones tweets for ai but its just giving me links

hkim2636 commented 3 years ago

Did you tried out --jsonl option? It will give you the content of tweets. snscrape --jsonl --max-results 100 twitter-hashtag archiveteam

arceuss commented 3 years ago

does it do tweets of a specific user

arceuss commented 3 years ago

also i tried but it just says its not a valid arguement

JustAnotherArchivist commented 3 years ago

--jsonl only exists in the current development version. You can also use --format '{content}' on the release version (or the dev one), but be aware that tweets can contain line breaks, so one line in the output will not necessarily be one tweet.

And yes, those options are independent of the scraper, so you can use them with twitter-user as well. (The available fields for --format do depend on the social network though.)

mvitha commented 3 years ago

got it to run -

installed the development version using: pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

ran from command line: snscrape --jsonl --max-results 10 twitter-user textfiles > text_files_test_10.json

then in py script or jupyter notebook script

import json import pandas

table = [] with open('text_files_test_10.json', 'r') as f: for line in f: table.append(json.loads(line))

pd.DataFrame.from_records(table).to_csv('text_files_test_10.csv')

thanks this is awesome!

arceuss commented 3 years ago

everything looks like this tho i dont know how to train this with ai lol

arceuss commented 3 years ago

DiameterEffect commented 3 years ago

got it to run -

installed the development version using: pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

ran from command line: snscrape --jsonl --max-results 10 twitter-user textfiles > text_files_test_10.json

then in py script or jupyter notebook script

import json import pandas

table = [] with open('text_files_test_10.json', 'r') as f: for line in f: table.append(json.loads(line))

pd.DataFrame.from_records(table).to_csv('text_files_test_10.csv')

thanks this is awesome!

for those having an error like name pd is not defined use import pandas as pd