Closed JustAnotherArchivist closed 1 year ago
In the past, Twitter was pretty good about maintaining backwards compatibility on the GraphQL API endpoints. Not this time...
what do you think how much time does it will take to fix that issue
Is there any logic to which original tweets now aren't getting picked up, or is it random?
I haven't found a clear pattern so far.
Due to this new 'conversation' rendering, the order is slightly messed up. This could be worked around by reordering in snscrape, but that could also cause issues and I'm lacking the time to implement it currently, so I won't do so until someone makes a good case for the added complexity. Yes, --since
will be slightly broken, but it already is due to pinned tweets.
I don't know if this will be of any help but I wrote a couple of functions to recursively get one branch 'reply' from a 'leaf' tweet. If someone can get a function to extract the list of replies for each tweet; you can retrieve the whole tree of conversations!
import pandas as pd
from snscrape.modules import twitter
dataset_test = pd.DataFrame(columns=['date', 'tweet', 'username', 'tweet_id',
'reply_to_id', 'conversation_id', 'location', 'coordinates'])
def get_replies(dataset, tweet_id):
for i, tweet in enumerate(twitter.TwitterTweetScraper(tweetId=tweet_id).get_items()):
if tweet.id not in dataset['tweet_id'].values:
dataset = pd.concat([dataset, pd.DataFrame({'date': [tweet.date], 'tweet': [tweet.content], 'username': [tweet.user.username], 'tweet_id': [tweet.id], 'reply_to_id': [
tweet.inReplyToTweetId], 'conversation_id': [tweet.conversationId], 'location': [tweet.place], 'coordinates': [tweet.coordinates]}, index=[0])], ignore_index=True)
print(tweet.inReplyToTweetId)
if tweet.id not in dataset['reply_to_id'].values:
dataset = get_replies(dataset, tweet.inReplyToTweetId)
return dataset
def get_threads(source_dataset=None, target_dataset=None):
if target_dataset is None:
target_dataset = pd.DataFrame(columns=[
'date', 'tweet', 'username', 'tweet_id', 'reply_to_id', 'conversation_id', 'location', 'coordinates'])
for j in range(len(source_dataset)):
if pd.notnull(source_dataset['reply_to_id'].iloc[j]):
try:
dataset = get_replies(
target_dataset, source_dataset['tweet_id'].iloc[j])
except:
print('error')
return dataset
i'm facing the same issue
Thanks for this great tool and all the work you've invested. I know the twitter module is becoming a bit of a headache lately with all the chaos and breaking changes they push. Unfortunately for most people here, as for me, this change also renders my project semi-functional with all these replies filtered out. Sure, not a great reason to implement your changes but I'm sure this would benefit in the long run since this will most probably be a change that will stick around for some time now. Otherwise arguably, each future change will continuously break different parts of SNScrape's twitter module little by little until becoming untenable. I lack ofcourse the knowlede to help in a meaningful way, but I would help, if given clear tasks.
I think the change made in the developer version last week helped this issue. Will it be pushed to the non-developer version soon? Many thanks for all of your work on this. It’s an amazing tool.
On Mon, Jun 5, 2023 at 11:16 AM codilau @.***> wrote:
Thanks for this great tool and all the work you've invested. I know the twitter module is becoming a bit of a headache lately with all the chaos and breaking changes they push. Unfortunately for most people here, as for me, this change also renders my project semi-functional with all these replies filtered out. Sure, not a great reason to implement your changes but I'm sure this would benefit in the long run since this will most probably be a change that will stick around for some time now. Otherwise arguably, each future change will continuously break different parts of SNScrape's twitter module little by little until becoming untenable. I lack ofcourse the knowlede to help in a meaningful way, but I would help, if given clear tasks.
— Reply to this email directly, view it on GitHub https://github.com/JustAnotherArchivist/snscrape/issues/937#issuecomment-1576998840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFRHH63DJOW45PIKJTCJPTXJXZ4NANCNFSM6AAAAAAYWIJCBI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Yes, this issue has been fixed in the dev version. A release will be made when the dev version is stable enough. There are other open issues that still need fixing before that can happen.
have you implemented the solution of the error?
have you implemented the solution of the error?
As stated previously, you'll have to use the dev version to benefit from the resolution. It's working quite good for me
could you send me the command to use to update it?
i am actually using snscrape 0.6.2.20230321.dev13+g786815d
@JustAnotherArchivist hi, where can I find the dev branch? in the main github page of the project there's only master and tests branch which hasn't been updated in a long time.
hello @JustAnotherArchivist i am usign snscrape 0.6.2.20230321.dev13+g786815d and facing same issue
friends, stop spamming and update your library. pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git the last version is snscrape-0.6.2.20230321.dev39+gc3b216c
friends, stop spamming and update your library. pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git the last version is snscrape-0.6.2.20230321.dev39+gc3b216c
I'm not using the pip package but the github version, does that mean dev version == master branch?
yes.
Twitter has started returning replies in a different format within the last few hours. This means that all replies and some original tweets are currently missing from snscrape's output and instead produce the warning mentioned in the title.
No, there are no workarounds currently. This affects both the release and dev versions.