DocNow / twarc-csv

A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.
MIT License
31 stars 10 forks source link

Error unexpected data : Problem converting jsonl file to csv file. #18

Closed ShivaniParekh closed 3 years ago

ShivaniParekh commented 3 years ago

twarc2 search --archive --start-time 2020-01-01 --limit 10 "reliance" tweets_reliance3.jsonl

twarc2 csv tweets_reliance3.jsonl tweets_reliance3.csv

I have used the following commands and I get error while converting jsonl to csv.

ERROR: Unexpected Data: "author.withheld.scope" to fix, add these with --extra-input-columns. Skipping entire batch of 666 tweets!
Even after using the command : twarc2 csv --extra-input-columns "author.witheld.scope" tweets_reliance3.jsonl tweets_reliance5.csv . I get the same error. Here is the json file. Jsonl file

igorbrigadir commented 3 years ago

I cannot reproduce this.

Running

twarc2 csv --extra-input-columns "author.withheld.scope" tweets_reliance3.jsonl tweets_reliance3.csv

works with the file provided.

twarc2 csv --extra-input-columns "author.withheld.scope" tweets_reliance3.jsonl tweets_reliance3.csv
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.35M/1.35M [00:00<00:00, 2.40MB/s]

ℹ️
Read 827 tweets from 1 lines. 
327 were referenced tweets, 161 were duplicates.
Wrote 666 rows and output 90 of 90 input columns in the CSV.

(Either way, I've added the missing field to the default fields anyway, and will make a new release shortly)

ShivaniParekh commented 3 years ago

Okay, It works. It seems that withheld in author.withheld.scope has two "h" and I used only one "h" therefore the error.

igorbrigadir commented 3 years ago

The new version is up, so

pip install --upgrade twarc-csv

to upgrade and it should work without --extra-input-columns