RipMeApp / ripme

Downloads albums in bulk
MIT License
3.72k stars 631 forks source link

instagram website changed- "Non-retriable status code 403 while downloading" #484

Closed Ectomind1990 closed 6 years ago

Ectomind1990 commented 6 years ago

It happens on other instagram accounts also. Its related to #384 which I raised in january

Expected Behavior

Download photos and videos from all 537 posts on that account

Actual Behavior

Downloading https://www.instagram.com/kutovakika/
Downloading https://scontent-lht6-1.cdninstagram.com/5B4467C7/t51.2885-15/sh0.08/e35/29093749_367590587052768_3182474189701382144_n.jpg
https://scontent-lht6-1.cdninstagram.com/5B4467C7/t51.2885-15/sh0.08/e35/29093749_367590587052768_3182474189701382144_n.jpg : Non-retriable status code 403 while downloading https://scontent-lht6-1.cdninstagram.com/5B4467C7/t51.2885-15/sh0.08/e35/29093749_367590587052768_3182474189701382144_n.jpg
Downloading https://scontent-lht6-1.cdninstagram.com/5B3456D8/t51.2885-15/sh0.08/e35/28754405_2041963605835948_1551946411904335872_n.jpg
https://scontent-lht6-1.cdninstagram.com/5B3456D8/t51.2885-15/sh0.08/e35/28754405_2041963605835948_1551946411904335872_n.jpg : Non-retriable status code 403 while downloading https://scontent-lht6-1.cdninstagram.com/5B3456D8/t51.2885-15/sh0.08/e35/28754405_2041963605835948_1551946411904335872_n.jpg

This is because this is not an official URL coming from instagram. Ripme is trying to get the full resolution photo by removing sub strings of the official URL coming from instagram. But that trick doesn't always work any more- instagram have started changing it this week

thinkpad4 commented 6 years ago

I noticed that too, was wondering what was going on

cyian-1756 commented 6 years ago

I noticed that too, was wondering what was going on

It looks like all URLs have to be signed now.

thinkpad4 commented 6 years ago

@cyian-1756 I noticed that /t51.2885-15/e35/ appears to be consistent in the few different links I've tried

Also this extension is able to get the images https://chrome.google.com/webstore/detail/instagram-high-resolution/jegjlojkkmlmfnhnogmmfbfamjdabgom I don't know if that can help you but maybe if you are able you can see how this extension is doing it

cyian-1756 commented 6 years ago

I've managed to kinda fix this by downloading the display_url instead of getting the full sized image. It's not prefect (The image that is downloaded isn't the largest possible) but it until someone manages to crack how IG is generating the signature it's the best that can be done

thinkpad4 commented 6 years ago

@cyian-1756 It appears it always be like this /5B719B26/t51.2885-15/e35/ the /5B719B26/ seems to be a random HEX number but the /t51.2885-15/e35/ is always the same, at least for now. Can't you make it like //t51.2885-15/e35/ Replacing the HEX number with a or something?

cyian-1756 commented 6 years ago

Can't you make it like //t51.2885-15/e35/ Replacing the HEX number with a or something?

I can but that still causes a 404. As far as I can tell the vp/6900a5948df03361950a2164a4ba560d part of the url is a md5 hash of the URL + a salt. If I you change the url at all you need to generate a new hash, but I don't have the salt so I can't generate a new hash

cyian-1756 commented 6 years ago

The image url for the image at https://www.instagram.com/p/BgrT-Gzh3F7/?taken-by=kutovakika is https://scontent-yyz1-1.cdninstagram.com/vp/6900a5948df03361950a2164a4ba560d/5B2E5099/t51.2885-15/s640x640/sh0.08/e35/28766860_558178931211208_8285564106707042304_n.jpg

Changing the image url to http doesn't cause the "Invalid URL signature" so it's safe to say that the "http" isn't part of the url that gets hashed

Making the sig longer or shorter returns a "Bad URL signature param" error so we can assume the sig is always 32 char long

cyian-1756 commented 6 years ago

This is kinda fixed in 1.7.28 but I'm leaving this open encase someone figures out how to get full size images again

thinkpad4 commented 6 years ago

@cyian-1756 Don't know if this would help but I was looking at the git of what appears to be the only working IG image downloader extension and they said "The right approach here seems to be fetching the image data and converting that to a blob which can be downloaded." I don't know if you can do that in RipMe but I hope so

cyian-1756 commented 6 years ago

looking at the git of what appears to be the only working IG image downloader

Can you link me to that?

thinkpad4 commented 6 years ago

@cyian-1756 https://github.com/ehmorris/Instagram-High-Resolution-Downloader

Hrxn commented 6 years ago

But this is for the profile picture only?

cyian-1756 commented 6 years ago

@thinkpad4

Am I misreading something or does the reddit method only work for profile images?

Edit: And it looks like https://github.com/ehmorris/Instagram-High-Resolution-Downloader doesn't get the full sized images

thinkpad4 commented 6 years ago

OMG, I am SUCH a derp. I didn't even see it saying Instagram Profile Pic in the title of that reddit post. I'll delete my post linking to that reddit post. I'm sorry @cyian-1756

cyian-1756 commented 6 years ago

@thinkpad4 It's fine

kevin51jiang commented 6 years ago

Going back to the IG topic, would using the actual Facebook/Instagram API work?

EDIT: Also thinking of embedding Selenium or something. Instagram on the web seems to work off react.

cyian-1756 commented 6 years ago

Going back to the IG topic, would using the actual Facebook/Instagram API work?

Maybe, but AFAIK the facebook api is locked down right now because of that data scraping scandal from a while back and IG keeps changing their API.

EDIT: Also thinking of embedding Selenium or something

We could and it's been done before, but it's a lot of extra weight on the end JAR for not that much benefit (Even most JS heavy sites can be scraped just using JSOUP)

Ectomind1990 commented 6 years ago

This is kinda fixed in 1.7.28 but I'm leaving this open encase someone figures out how to get full size images again

thanks for fixing and keeping us in the loop about what the deal is and thanks in general to the people making this tool

Ectomind1990 commented 6 years ago

On gallery pages I'm still getting the full size images. Doing https://www.instagram.com/karencantuq/ Gets these files 2018_03_13_04_57_BgRVsdZgFB528764494_200825714018087_2503767454539317248_n.jpg 2018_03_13_04_57_BgRVsdZgFB528752003_195423554566125_7057308480591364096_n.jpg They are both over 2mb

Does this help find a way to get full size from non gallery pages also?

Hrxn commented 6 years ago

What do you mean by non-gallery pages?

cyian-1756 commented 6 years ago

This should be fixed in 1.7.36

Ectomind1990 commented 6 years ago

A page with only 1 image like https://www.instagram.com/p/BgXj_xKA67E/ But I just tested it and 85.2kb is actually the full size image which is what ripme gets. I think? So actually this is fixed??

Ectomind1990 commented 6 years ago

This should be fixed in 1.7.36

thanks :+1: