Twitter profile scraper misses replies: Skipping unrecognised entry ID: 'profile-conversation-…'

JustAnotherArchivist commented 1 year ago

Twitter has started returning replies in a different format within the last few hours. This means that all replies and some original tweets are currently missing from snscrape's output and instead produce the warning mentioned in the title.

No, there are no workarounds currently. This affects both the release and dev versions.

JustAnotherArchivist commented 1 year ago

In the past, Twitter was pretty good about maintaining backwards compatibility on the GraphQL API endpoints. Not this time...

aditya-bansal-7 commented 1 year ago

what do you think how much time does it will take to fix that issue

nicolegorton commented 1 year ago

Is there any logic to which original tweets now aren't getting picked up, or is it random?

JustAnotherArchivist commented 1 year ago

I haven't found a clear pattern so far.

JustAnotherArchivist commented 1 year ago

Due to this new 'conversation' rendering, the order is slightly messed up. This could be worked around by reordering in snscrape, but that could also cause issues and I'm lacking the time to implement it currently, so I won't do so until someone makes a good case for the added complexity. Yes, --since will be slightly broken, but it already is due to pinned tweets.

Ashoka74 commented 1 year ago

I don't know if this will be of any help but I wrote a couple of functions to recursively get one branch 'reply' from a 'leaf' tweet. If someone can get a function to extract the list of replies for each tweet; you can retrieve the whole tree of conversations!

import pandas as pd
from snscrape.modules import twitter

dataset_test = pd.DataFrame(columns=['date', 'tweet', 'username', 'tweet_id',
                                     'reply_to_id', 'conversation_id', 'location', 'coordinates'])

def get_replies(dataset, tweet_id):
    for i, tweet in enumerate(twitter.TwitterTweetScraper(tweetId=tweet_id).get_items()):
        if tweet.id not in dataset['tweet_id'].values:
            dataset = pd.concat([dataset, pd.DataFrame({'date': [tweet.date], 'tweet': [tweet.content], 'username': [tweet.user.username], 'tweet_id': [tweet.id], 'reply_to_id': [
                                tweet.inReplyToTweetId], 'conversation_id': [tweet.conversationId], 'location': [tweet.place], 'coordinates': [tweet.coordinates]}, index=[0])], ignore_index=True)
            print(tweet.inReplyToTweetId)
            if tweet.id not in dataset['reply_to_id'].values:
                dataset = get_replies(dataset, tweet.inReplyToTweetId)
    return dataset

def get_threads(source_dataset=None, target_dataset=None):
    if target_dataset is None:
        target_dataset = pd.DataFrame(columns=[
                                      'date', 'tweet', 'username', 'tweet_id', 'reply_to_id', 'conversation_id', 'location', 'coordinates'])
    for j in range(len(source_dataset)):
        if pd.notnull(source_dataset['reply_to_id'].iloc[j]):
            try:
                dataset = get_replies(
                    target_dataset, source_dataset['tweet_id'].iloc[j])
            except:
                print('error')
    return dataset

djialeu commented 1 year ago

i'm facing the same issue

codilau commented 1 year ago

Thanks for this great tool and all the work you've invested. I know the twitter module is becoming a bit of a headache lately with all the chaos and breaking changes they push. Unfortunately for most people here, as for me, this change also renders my project semi-functional with all these replies filtered out. Sure, not a great reason to implement your changes but I'm sure this would benefit in the long run since this will most probably be a change that will stick around for some time now. Otherwise arguably, each future change will continuously break different parts of SNScrape's twitter module little by little until becoming untenable. I lack ofcourse the knowlede to help in a meaningful way, but I would help, if given clear tasks.

nicolegorton commented 1 year ago

I think the change made in the developer version last week helped this issue. Will it be pushed to the non-developer version soon? Many thanks for all of your work on this. It’s an amazing tool.

On Mon, Jun 5, 2023 at 11:16 AM codilau @.***> wrote:

Thanks for this great tool and all the work you've invested. I know the twitter module is becoming a bit of a headache lately with all the chaos and breaking changes they push. Unfortunately for most people here, as for me, this change also renders my project semi-functional with all these replies filtered out. Sure, not a great reason to implement your changes but I'm sure this would benefit in the long run since this will most probably be a change that will stick around for some time now. Otherwise arguably, each future change will continuously break different parts of SNScrape's twitter module little by little until becoming untenable. I lack ofcourse the knowlede to help in a meaningful way, but I would help, if given clear tasks.

— Reply to this email directly, view it on GitHub https://github.com/JustAnotherArchivist/snscrape/issues/937#issuecomment-1576998840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFRHH63DJOW45PIKJTCJPTXJXZ4NANCNFSM6AAAAAAYWIJCBI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JustAnotherArchivist commented 1 year ago

Yes, this issue has been fixed in the dev version. A release will be made when the dev version is stable enough. There are other open issues that still need fixing before that can happen.

ihabpalamino commented 1 year ago

have you implemented the solution of the error?

codilau commented 1 year ago

have you implemented the solution of the error?

As stated previously, you'll have to use the dev version to benefit from the resolution. It's working quite good for me

ihabpalamino commented 1 year ago

could you send me the command to use to update it?

ihabpalamino commented 1 year ago

i am actually using snscrape 0.6.2.20230321.dev13+g786815d

DrSocket commented 1 year ago

@JustAnotherArchivist hi, where can I find the dev branch? in the main github page of the project there's only master and tests branch which hasn't been updated in a long time.

ihabpalamino commented 1 year ago

hello @JustAnotherArchivist i am usign snscrape 0.6.2.20230321.dev13+g786815d and facing same issue

codilau commented 1 year ago

friends, stop spamming and update your library. pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git the last version is snscrape-0.6.2.20230321.dev39+gc3b216c

DrSocket commented 1 year ago

friends, stop spamming and update your library. pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git the last version is snscrape-0.6.2.20230321.dev39+gc3b216c

I'm not using the pip package but the github version, does that mean dev version == master branch?

codilau commented 1 year ago

yes.

JustAnotherArchivist / snscrape

Twitter profile scraper misses replies: Skipping unrecognised entry ID: 'profile-conversation-…' #937