JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.33k stars 699 forks source link

Options for --format #175

Closed muhnick closed 3 years ago

muhnick commented 3 years ago

Hi, can you please provide info for the options available for format? Is it not possible to get the messages on the cli rather than just URL's to tweets?

JustAnotherArchivist commented 3 years ago

There is currently no documentation on this (#6). The --format value is a Python format string, and the available fields depend on the scraper. For Twitter scrapers that produce tweets (e.g. twitter-user and twitter-search), the relevant dataclass is snscrape.modules.twitter.Tweet:

https://github.com/JustAnotherArchivist/snscrape/blob/892941b609e9a995748a871aeaf211bddf429827/snscrape/modules/twitter.py#L21-L47

For example, --format '{url} {content}' will get you the URL, a space, and the tweet text. Note that the content may contain newlines though. You might want to use the repr conversion with {content!r} to avoid issues due to that, although that entirely depends on what you intend to do with the output.

muhnick commented 3 years ago

Hi, thanks for the reply, it helps, but can I suggest you update the documentation so this is a little more obvious? Im not a python programmer, using it from the cli and struggling to understand the capabilities because the documentation just isnt there if you cant read the code.

But I guess I can look in the other source files and find similar?

JustAnotherArchivist commented 3 years ago

Absolutely – I've been meaning to write documentation ever since I first released this (see the issue linked above), but unfortunately I just haven't found the time to do so until now. It's definitely still planned though. Until then, yeah, the source code plus the questions here on the issue tracker will have to do I'm afraid.