bear / python-twitter

A Python wrapper around the Twitter API.
Apache License 2.0
3.41k stars 957 forks source link

calc_expected_status_length() is not counting emoji properly? #615

Open mwoolweaver opened 5 years ago

mwoolweaver commented 5 years ago

i have a string that i put together like so

def construct_tweet(pihole, sys):
    tweet = '🚫🌐: ' + pihole[0]
    tweet += '\nπŸˆ΅β‰: ' + pihole[1]
    tweet += '\nπŸ“’πŸš«: ' + pihole[2]
    tweet += '\n⁉⏭: ' + pihole[3]
    tweet += '\nβ‰πŸ’Ύ: ' + pihole[4]
    tweet += '\nπŸ¦„πŸ™ˆ: ' + pihole[5]
    tweet += '\nπŸ”πŸŽš: ' + pihole[6]
    tweet += '\nπŸš«πŸ“βŒ›: ' + pihole[7]
    tweet += '\nβš–οΈxΜ…: ' + sys[1]
    tweet += '\nπŸπŸ“ˆ: ' + sys[2]
    tweet += '\nπŸ”—πŸ“‘: ' + sys[3]
    tweet += '\nπŸ’ΎπŸ“Š: ' + sys[4]
    tweet += '\n🐧🌽: ' + sys[5]
    tweet += '\nπŸ–₯οΈπŸ‘’β³: ' + sys[0]
    # print(tweet) # always print tweet to console so we can see the output locally
    return tweet

a generated output can be seen here

🚫🌐: 811,593
πŸˆ΅β‰: 32,143
πŸ“’πŸš«: 18,527|57.64%
⁉⏭: 8,805
β‰πŸ’Ύ: 4,811
πŸ¦„πŸ™ˆ: 5
πŸ”πŸŽš: 2
πŸš«πŸ“βŒ›: 2019-05-19 08:37
βš–οΈxΜ…: 0.0, 0.0, 0.0
πŸπŸ“ˆ: 460M/1G|37.5%
πŸ”—πŸ“‘: ens4, tun0, tun1
πŸ’ΎπŸ“Š: 8G/28G|28.57%
🐧🌽: Linux-5.0.0-1006-gcp-x86_64-with-Ubuntu-19.10-eoan
πŸ–₯οΈπŸ‘’β³: 2019-05-19 03:40

i have counted (all character, spaces included) to be 244 but calc_expected_status_length() shows it be 283 for some reason?

i have taken all the emoji out of the string and counted them alone and calc_expected_status_length() return a length of 60 and i only see 28 individual emoji?


πŸš«πŸŒπŸˆ΅β‰πŸ“’
πŸš«β‰β­β‰πŸ’Ύ
πŸ¦„πŸ™ˆπŸ”πŸŽšπŸš«
πŸ“βŒ›οΈxΜ…πŸπŸ“ˆ
πŸ”—πŸ“‘πŸ’ΎπŸ“ŠπŸ§
πŸŒ½οΈπŸ‘’β³

there are 216 other characters (not counting emoji)

working code cane be seen here.

thank you in advance as this is the only way i've found to count characters even this accurately for verifying a tweets length.

please let me know if i can provide anymore info on this issue

edit:

also worth noting is twitter shows this exact tweet having 4 character (i just copy and pasted it from here to Twitter)

Edit 2

Seem Twitter started counting emoji as 2 characters each no matter which one it is

https://twittercommunity.com/t/new-update-to-the-twitter-text-library-emoji-character-count/114607

Edit 3

for now i have resorted to using a modified version of the solution mentioned here, I removed the check for \n since it seems twitter actually counts those towards the total character count.

    num_emoji = sum(tweets.count(emoji) for emoji in UNICODE_EMOJI) # accurately count and track emoji
    ignored_chars = UNICODE_EMOJI.copy() # thanks to https://stackoverflow.com/q/56214183/11456464

    num_other = sum(0 if char in ignored_chars else 1 for char in tweet)
    print(num_emoji, num_other, str((num_emoji * 2) + num_other))

not sure if this is the proper way to go about it or not but it seems to work fairly well for me as of right now.

jeremylow commented 5 years ago

Huh, I didn't see that change to how they're counting emojis. I'll look into this. Thanks!