Mincka / DMArchiver

A tool to archive the direct messages, images and videos from your private conversations on Twitter
GNU General Public License v3.0
222 stars 25 forks source link

Retrieve the information about the sent / seen status #48

Closed mmosleh closed 6 years ago

mmosleh commented 6 years ago

I want to know how the status of the a sent message "sent" vs "seen" can be captured by the script.

Thanks!

Mincka commented 6 years ago

Hi,

Interesting request. I think it could be possible. May I ask for which kind of purpose you would need to know the status of the last messages you sent?

The only use case I am able to think about is some sort of mass messaging where you would like to know which users have open / seen the message.

mmosleh commented 6 years ago

Hi, Thank you for your reply.

One use case is mass messaging, another is automated messaging when you want to know if the message is seen and then take some action (e.g., follow up, send another one...), and another is to have a more complete archive of your personal messages with their seen status.

I already gave it a try using your well-written code. Using the following line after reading the time stamp of the message.

ReadReceipt = dm_footer[0].cssselect('span.DMReadReceipt-statusView')

Although this variable returns the correct 'seen' or 'sent' status on the browser (using inspect), it always returns 'sent' using the script even when the message is actually seen!

I was wondering if the content receive by the script is slightly different than that of the browser. I tried to open 'https://twitter.com/messages/with/conversation?id=....' with the browser to see if the exact same content in the browser but it didn't work.

Thanks!

Mincka commented 6 years ago

Indeed, it seems that this is the associated text which is "Sent" by default. If you click on the blue tick, the DOM is updated with the text "Seen by everyone" / "Seen" and the "is-expanded" CSS class is added so the text is displayed.

It looks like you have to test for the presence of the CSS classes "is-seen" or "is-seenAnimated" with something like dm_footer[0].cssselect('span.is-seen'). If the array is not empty, then the message is seen (or partially seen in a group?), otherwise it is sent (or partially seen in a group?).

Seen message:

  <span class="DMReadReceipt-check is-seen is-seenAnimated">
    <span class="Icon Icon--checkLight"></span>
  </span>

Sent message:

  <span class="DMReadReceipt-check">
    <span class="Icon Icon--checkLight"></span>
  </span>

I did not see other places that could give an indication about the message status. I did not check how the JS is able to update the status. Maybe it keeps a client-side status or maybe it also relies on the detection of the CSS class. 😀

mmosleh commented 6 years ago

Awesome! Thanks a lot. This is exactly what I wanted. But now ran into another problem. The original script sends back the following error:

Expecting value: line 1 column 1 (char 0)

I looked at "response.content" and it returns:

b'<!DOCTYPE html>\n<html lang="en">\n<head>\n <meta charset="utf-8" />\n <title>Twitter</title>\n <style>\n body {\n background-color: #ffffff;\n }\n .link {\n color: #1da1f2;\n font-family: sans-serif;\n }\n </style>\n</head>\n<body>\n <noscript>\n \n <center><a href="/messages" class="link">Continue</a></center>\n </noscript>\n <script nonce="lYnW6pOTpZX+BQjh4Sh1/Q==">\n \n document.cookie = "app_shell_visited=1;domain=.twitter.com;path=/;max-age=30";\n \n location.replace(location.href);\n </script>\n</body>\n</html>\n'

I used the exact script from github which was working fine

Thanks!

Mincka commented 6 years ago

Hum, it was working previously? It looks like a recent update to prevent parsing.

Anyway, I have a workaround. 😀

Add these few additional headers and it should make the trick. 😉 https://github.com/Mincka/DMArchiver/commit/ae151b0ce6ddfcd5dc95c96e5f6c9874c4b38a18#diff-ebdd7cd1a345766481630db935d19b6b

mmosleh commented 6 years ago

Thanks a lot for updating the script and being so supportive. But still the script cannot find anything related to "seen" status. For example, the content printed by the following test line doesn't have the word "seen" for a seen message.

response=_session.get(_twitter_base_url+'/messages/with/conversation?id=xxxx',headers=_ajax_headers )
print (response.content)

While in browser without clicking on the message class="DMReadReceipt-check is-seen is-seenAnimated" exists for seen messages.

Thanks again!

Mincka commented 6 years ago

You are right. I tried to find how the value was retrieved but I did not find yet.

I'm not sure this is something available in the HTML code of the tweet, because even using a browser, the "is-seen" class is not yet present in the HTTP response. It looks like it relies on another request to get this information and the DOM is updated afterward in JS.

mmosleh commented 6 years ago

Thanks a lot for looking into this. Unfortunately this is not available through the twitter API either. Hope I could find a solution soon.

Thanks again!

mmosleh commented 6 years ago

So I ended up resolving this using Selenium and it worked fine. Thanks for being so supportive.

Mincka commented 6 years ago

You're welcome. Thank you for the update. I also had to use Selenium for another parsing project. It's a bit slower but more reliable. Good luck for your project.