Mincka / DMArchiver

A tool to archive the direct messages, images and videos from your private conversations on Twitter
GNU General Public License v3.0
223 stars 25 forks source link

The tool is broken after it was working well. #58

Closed rosemona247 closed 5 years ago

rosemona247 commented 6 years ago

Dear

I used this tool for a while and it's amazing.

I think its not working any more. It's not downloading media only downloading emojis. Hope you fine out the issue soon and fix it. We depend on you. You are our hero.

Thanks it's one of its kind.

LaurentLC commented 6 years ago

I happened to notice the same problem (but it's a pitty the title of the issue isn't more specific…)

I'm using the exe with a Windows version. Can't tell for the videos, but apparently the path to images (as we can read it in the txt file), for instance https://ton.twitter.com/1.1/ton/data/dm/xxxxx/xxxxx/filename.jpg is somehow redirected by Twitter server to https://ton.twitter.com/i/ton/data/dm/xxxxx/xxxxx/filename.jpg, and seems to send a 404.

The "https://ton.twitter.com/i/ton/data/dm" doesn't seem to be the problem. However, when i'm opening this image via me web browser, the exact call is https://ton.twitter.com/i/ton/data/dm/xxxxx/xxxxx/filename.jpg:large

and the ":large" seems to change everything…

Hope it'll help.

Thx again for the app and everything.

rosemona247 commented 6 years ago

I tried to open images links in web browser and it didn't work for me. I even tried your trick adding ":large" and still return 404 for me. This tool was a great help but now seems something is broken. I'm trying to fix the code myself but I'm not a python developer so it will take me a bit time to figure it out.

I'm also using windows version the latest release. Video seems not working. the only thing that the tool can fetch now. Is the conversion and emojis only.

Regards

cajuncook commented 5 years ago

I don't claim to understand exactly what's happening, but this is what I've figured out:

Using any browser's inspect tool and manually going into your DMs, you can pull a link to an image by scrolling back and finding the .jpg href. This link, via copy & paste works once to load the image (the /1.1/ in the ton.twitter.com URL redirects to /i/). If you reload that link you'll get a 404, and the /1.1/ link will thereafter deliver a 404 until you refresh Twitter itself and reopen/scroll back through your DMs until it loads again.

I feel pretty certain there's some javascript-driven retrieval, delivery/caching, and deletion that's obfuscated from the front end user. If we could reverse-engineer the request that's forcing Twitter to load that image onto the Internet-facing ton.twitter.com, we could retrieve the images. Otherwise I'm kind of stumped.

Note that this treatment isn't being done to gifs/mp4s, which is why the scraping of those hasn't been affected by this issue.

cajuncook commented 5 years ago

Taking better note of @LaurentLC's comment, wondering if just appending ":large" to the media url is enough for images -- gonna try a dirty fix and see what happens.

Mincka commented 5 years ago

Thanks all for the report and the troubleshooting. It was related to a referer header issue. That's why you weren't able to load the image from the browser. It's now fixed in release 0.2.5.