Open Asparagirl opened 9 years ago
Agreed, also for hashtags, i.e. https://twitter.com/hashtag/SomeTag
to https://twitter.com/hashtag/SomeTag/
(and https://twitter.com/hashtag/SomeTag?src=hash
to https://twitter.com/hashtag/SomeTag/?src=hash
).
Personally, I'd prefer an error message though instead of magically rewriting links.
Better still would be an !twitter, but I'd also like a better understanding of how it's used. We could, for example, !twitter falcondarkstar
, and see the same thing as !a https://twitter.com/falcondarkstar/ --phantomjs --igset twitter
(or whatever we choose to do), but we could also !twitter #archivebot
and stuff. Do we know all the use cases for this?
I think such a separate command would be a very good idea, but for a different reason. !a https://twitter.com/user --phantomjs --igset twitter
doesn't work at all on some pipelines (only fetches the first page of tweets) and doesn't reliably retrieve all tweets on others (see #archivebot
from 2017-07-03 around 20:00 UTC). If we had a !twitter
command, we could tweak the grab to retrieve everything properly. There are different ways to do that, but I believe the only one which works even for very large tweet histories (the API only returns the 3200 newest tweets) is searching for all tweets by a user from a specific date, iterating through all dates back to when the account was created.
Side note: I'd prefer !twitter @username
, since that's the syntax used on Twitter to refer to user accounts.
The current mechanism for grabbing them via phantomjs has a heuristic stopping point based mostly on timeouts, if I remember correctly, so it's liable to stop prematurely at completely random points.
With that said, this and your idea to iterate dates to account creation are pipeline changes, and should probably go in a different ticket; a !twitter command can be handled solely with changes to the IRC bot.
On 7/6/2017 15:13, JustAnotherArchivist wrote:
I think such a separate command would be a very good idea, but for a different reason. |!a https://twitter.com/user --phantomjs --igset twitter| doesn't work at all on some pipelines (only fetches the first page of tweets) and doesn't reliably retrieve all tweets on others (see |#archivebot| from 2017-07-03 around 20:00 UTC). If we had a |!twitter| command, we could tweak the grab to retrieve everything properly. There are different ways to do that, but I believe the only one which works even for very large tweet histories (the API only returns the 3200 newest tweets) is searching for all tweets by a user from a specific date, iterating through all dates back to when the account was created.
Side note: I'd prefer |!twitter @username|, since that's the syntax used on Twitter to refer to user accounts.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArchiveTeam/ArchiveBot/issues/185#issuecomment-313533772, or mute the thread https://github.com/notifications/unsubscribe-auth/AFNkF4pnKWg_hu47vS88yty7Uuvc4yaZks5sLVv_gaJpZM4Gbc8x.
Yeah, I know about the stopping. However, it seems that it never retrieves the second page on some pipelines. Anyway, that probably doesn't belong in this issue either.
If a user tries to archive an individual Twitter feed, such as...
...ArchiveBot should assume that they do not actually want to archive the entirety of Twitter.com, and should automatically add a trailing slash to the name like this...
...thereby saving the job and the pipelines from the user's stupidity. :blush: