geduldig / TwitterAPI

Minimal python wrapper for Twitter's REST and Streaming APIs
936 stars 263 forks source link

Fixed 'Media_key_hydration' problem.. #198

Closed marcelo-mascarenhas closed 2 years ago

marcelo-mascarenhas commented 3 years ago

The problem was due to the fact that the information was getting overwritten at each iteration of new_fields loop.. Consequently, the library was returning the information of just one media key.

marcelo-mascarenhas commented 3 years ago

Update

The prior modification actually interfered with the functioning of the 'cursor' feature, causing it to not work in some cases. Changing to a more specific solution, like the recent commit, might be a better idea.. I've done some testing in api.request and TwitterPager, and it seems to work fine now. But maybe the library is still overwriting some information that nobody noticed yet, causing loss of data..

dylancaponi commented 3 years ago

You can test if it's overwriting by comparing the output of twurl with the output of this library.

I tried to recreate this issue but was unable to. Any other details would be helpful.

This is a snippet of our pagination code:

    results = []
    pager = TwitterPager(api,
                         endpoint,
                         query_params,
                         hydrate_type=HydrateType.APPEND)

    # Paginate all data
    for i, item in enumerate(pager.get_iterator()):
        if 'meta' in item:
            del item['meta']
        results.append(item)
marcelo-mascarenhas commented 3 years ago

Hello, Dylan! Hope you're doing fine!

I think I expressed myself badly, sorry. I didn't have the time to test it at that time, and I was just conjecturing. (Maybe some other parameter could be being overwritten in the same way that 'Media_Keys_Hydrated' was.) But I compared the request of TwitterAPI after the modification with Twarc's earlier today, and it is pretty much the same.

I think that the response is fine now!

Best Regards, Marcelo.

dylancaponi commented 3 years ago

I think I misunderstood the original issue.

We are seeing only one media key return for tweets with multiple media keys.

This PR fixes that problem and all media key data is returned.

geduldig commented 2 years ago

Sorry, getting to this after a hiatus. Is this still an issue?

Dheavyman commented 2 years ago

Sorry, getting to this after a hiatus. Is this still an issue?

Yea, it's still an issue.

marcelo-mascarenhas commented 2 years ago

As Dheavyman already said, it's still an issue. @geduldig, you can test it by using the snippet of code that you've provided to retrieve single tweets with Hydrate.


TWEET_ID = '1413529669509390336'

EXPANSIONS = 'author_id,referenced_tweets.id,referenced_tweets.id.author_id,in_reply_to_user_id,attachments.media_keys,attachments.poll_ids,geo.place_id,entities.mentions.username'
MEDIA_FIELDS = 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics'
TWEET_FIELDS = 'created_at,author_id,public_metrics,context_annotations,entities'
USER_FIELDS = 'location,profile_image_url,verified'

o = TwitterOAuth.read_file()
api = TwitterAPI(o.consumer_key, o.consumer_secret, o.access_token_key, o.access_token_secret, api_version='2')
r = api.request(f'tweets/:{TWEET_ID}', 
{
    'expansions': EXPANSIONS,
    'tweet.fields': TWEET_FIELDS,
    'user.fields': USER_FIELDS,
    'media.fields': MEDIA_FIELDS,
}, 
hydrate_type=HydrateType.APPEND)

for item in r:
print(json.dumps(item, indent=2, sort_keys=True))

Choose a tweet that has multiple media and retrieve it with and without the PR modification. You should see the issue. (You could use the Tweet ID that I used in Issues to demonstrate the problem.)

geduldig commented 2 years ago

This PR seems to work for hydrate_type=HydrateType.APPEND. However, the problem still persists for hydrate_type=HydrateType.REPLACE.

geduldig commented 2 years ago

I probably won't use this PR, since it doesn't handle REPLACE. I came up with the following instead:

        if field in parent:
            if item[1] == 'media_keys':
                parent[field] += include
                if field_suffix == '':
                    parent[field].remove(include[0]['media_key']) # REPLACE option
            else:
                parent[field] = include
        else:
            parent[field] = include

Let me know if you have any commets/suggestions. Will publish tomorrow.

geduldig commented 2 years ago

Should be fixed in v2.7.8. Thank you for spotting this error @marcelo-mascarenhas!