Mincka / DMArchiver

A tool to archive the direct messages, images and videos from your private conversations on Twitter
GNU General Public License v3.0
222 stars 25 forks source link

Scalability (# of messages)? #5

Closed muesliq closed 7 years ago

muesliq commented 7 years ago

Are there limits of the number of messages? I successfully tested the script with roughly 13k messages / 1.3mb in one conversation.

The script seems to cache the messages. Would it maybe more scalable if it stored the messages into a file in an incremental fashion instead of caching them?

Mincka commented 7 years ago

There is no limit in the number of messages that can be retrieved.

I know someone who has been able to retrieve a conversation with more than 80 000 tweets and 900 images. 😆

The script is building the entire conversation in memory but only of the parsed content, so I think the size is not so big for the majority of conversations. The goal was to reverse the order of the conversation at the end, to output them in a file in a chronological order.

However, I think it would be also possible to prepend the reversed content of each batch of 20 tweets in a file directly. I've already added a -raw-output switch to write the HTML content to a file after each request.

I am not sure yet of the best design to allow me to add additional output types. I was thinking about splitting the HTTP receiver and the parser completely so it would be possible to work on the parsing "offline".

muhittin commented 5 years ago

I know someone who has been able to retrieve a conversation with more than 80 000 tweets and 900 images. 😆

165k tweets, 2262 images, 60 videos from a conversation 😎