DocNow / twarc

A command line tool (and Python library) for archiving Twitter JSON
https://twarc-project.readthedocs.io
MIT License
1.37k stars 255 forks source link

conversations command #472

Closed edsu closed 3 years ago

edsu commented 3 years ago

It might be nice to have a sub command (maybe a plugin) that let you fetch conversation threads for tweets in a file. Kind of like the process that Ryan is suggesting here:

https://twitter.com/ryanjgallag/status/1390727713082728448

twarc2 search blacklivesmatter > tweets.jsonl
twarc2 conversations tweets.jsonl > convesations.jsonl
edsu commented 3 years ago

I added a conversation subcommand. I'm not sure if conversations belongs in core twarc or in a plugin. I guess it won't have any crazy dependencies so maybe in the core?

igorbrigadir commented 3 years ago

Yes, I think it fits into core - incidentally I think timelines https://github.com/DocNow/twarc-timelines does too, and so does IDs https://github.com/DocNow/twarc-ids but they also make excellent minimal examples of a plugin

edsu commented 3 years ago

In b7fa0db2b90f573e5937b0716170c37db609a47a I added twarc2 conversations and twarc2 timelines. The new timelines subcommand can read the usernames to fetch timelines for from a file of tweets or a text file of user ids/usernames. I left the more complex behavior of writing to a directory of different files, and using since_id to only fetch new data, to a renamed plugin twarc-timeline-archive.