4pr0n / ripme

Downloads albums in bulk
MIT License
918 stars 203 forks source link

500px Integration #396

Closed ghost closed 7 years ago

ghost commented 7 years ago

I don't believe 500px is a site that is currently supported by RipMe. I just tried it and it tries to work and does, but it fails to download any content. It creates the folder, and downloads, but does not give a failed message in the log. I tried this user: https://500px.com/luizlaercio It showed this in the log: Downloading https://500px.com/photo/187646793/the-corner-by-luiz-laercio Downloading https://500px.com/photo/187262199/moulin-rouge-by-luiz-laercio

Now, since 500px won't allow you to right-click and save images from the website because of copyright.. you can actually download the full-size images by looking at the Media tab in Page Info (I use Firefox).

Example: https://500px.com/photo/188772419/il-pane-di-natale-by-luiz-laercio?ctx_page=1&from=user&user_id=238340 Is: https://drscdn.500px.org/photo/188772419/q%3D80_m%3D2000_k%3D1/9026c5d135f99dd3233c7f6c1e30086f

It is not a .jpg extention, but if you copy and paste that link into the browser and right-click save, you can save it as a jpg.

Edit: Adult content my be different.. I don't know. You don't have to click "Show me the content" to see it. It still works the same if you view the Media for the link.

rautamiekka commented 7 years ago

EDIT: Added ssdeep (algo version 1.1 used by program version 2.13) and Whirlpool (program name hashdeep), both run on Ubuntu Linux 16.04 LTS x64.

I got the idea the last string in the DL link would be a hash, but I ran the following hashes on it (RapidCRC Unicode Portable) and nothing matches at all, perfect 0%:

So if it's a hash, it's not one I know, but the hash has same length as MD5.

metaprime commented 7 years ago

If you search in all issues and pull requests you'll see a lot of work previously went into making 500px work, but it seems they keep changing the website (or maybe even coming up with clever ways to block downloaders like ours from downloading the content).

It can probably be fixed but it will take some looking into.

ghost commented 7 years ago

Oh, I just noticed in the title bar for the link, it has the actual filename. untitled

metaprime commented 7 years ago

Side note: there are extensions like "right-to-click" which block the mechanism websites use to disable right-click.

metaprime commented 7 years ago

In Chrome it seems to show just the last segment of the URL in the title, but right-click and save gets the right filename. Not sure what's going on there...

image

ghost commented 7 years ago

It does for me in Firefox. Get the link, copy and paste it into the search bar and hit enter, and it shows for me.

rautamiekka commented 7 years ago

@tehloxely, that's exactly what he has in the screenshot. Firefox indeed shows the actual filename in the tab itself, but Chrome doesn't.

metaprime commented 7 years ago

@tehloxely yeah, I get that, I'm just not sure by what mechanism it is figuring out the title of the image. We'd need to know how to get that information from RipMe to take advantage of the information.

rautamiekka commented 7 years ago

I noticed that although the last segment of the address seems like a hash and has same length as an MD5 hash, none of the 12 hashes I tried match it at all.

metaprime commented 7 years ago

@rautamiekka If it is a hash as opposed to a randomly generated ID, you'd need to know exactly what blob they are hashing. Might be salted. I don't think we're going to get anywhere that way.

Also even if it is a hash, hashes are not reversible, so we can't use it to get the original blob that was hashed anyway.

ghost commented 7 years ago

Do you think /r/DataHoarder/ might be able to shed some light on it?

rautamiekka commented 7 years ago

@metaprime Well it was worth a try. If it actually was MD5 (or other hash of same output length) it'd help to verify the same file was downloaded.

I know hashes are one way, but I figured something has to be tried in case it works.

metaprime commented 7 years ago

@rautamiekka that's a good point -- I guess validation is nice but not something we especially need to care about, I think. Anyway I think we have bigger fish to fry than reverse engineering their image key.

metaprime commented 7 years ago

@tehloxely worth a try

ghost commented 7 years ago

I will ask, but an answer is not always guaranteed. Also ripping 500px isn't crucial, it would just be a nice addition to the app.

metaprime commented 7 years ago

@tehloxely considering it was supported in the past, I consider it a bug that needs fixing that it doesn't work right now. The only question is how much time I'll have to devote to it. As of now I'm not a user of 500px so I don't understand the site very well, but I should be able to figure it out regardless.

Could someone provide an example of a profile that contains some NSFW images? Apparently that was problematic as a special case before, and I want to make sure it remains supported.

cyian-1756 commented 7 years ago

At this point 500px works (tested using 1.4.6) and my pull downloads images without the water mark on them (Including adult images) so I think it's safe to close this

metaprime commented 7 years ago

@cyian-1756 agreed