DocNow / twarc-csv

A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.
MIT License
31 stars 10 forks source link

Tweet ID with no text #22

Closed sakibsh closed 3 years ago

sakibsh commented 3 years ago

Reproducible code:

!twarc2 search “(vaccine OR jab OR vaxine) (-is:retweet) (lang:en)” --archive --start-time 2021-03-01T00:00:00 --end-time 2021-03-03T00:00:00 --limit 300 raw_output.json

!twarc2 flatten ‘raw_output.json’ ‘flattened_output.json’

!twarc2 csv --output-columns “id,created_at,text” ‘flattened_output.json’ ‘outputshort.csv’

outputshort.csv

igorbrigadir commented 3 years ago

Do you have raw_output.json too just so i can check it's the same as mine?

sakibsh commented 3 years ago

I cannot upload json files here. But please use this link: https://drive.google.com/file/d/1Ylhu-q77VqjwAW14JraOh6fZ7yxeiS3t/view?usp=sharing

edsu commented 3 years ago

It looks like tweet 1366901116642934788 was made in response to tweet 1366900920320225287 which has been deleted?

I guess it might make sense for twarc-csv to not include separate rows for referenced tweets that lack any metadata?

igorbrigadir commented 3 years ago

This should be fixed in the new version!

pip3 install --upgrade twarc-csv