500px Integration - Githubissues

ghost commented 7 years ago

I don't believe 500px is a site that is currently supported by RipMe. I just tried it and it tries to work and does, but it fails to download any content. It creates the folder, and downloads, but does not give a failed message in the log. I tried this user: https://500px.com/luizlaercio It showed this in the log: Downloading https://500px.com/photo/187646793/the-corner-by-luiz-laercio Downloading https://500px.com/photo/187262199/moulin-rouge-by-luiz-laercio

Now, since 500px won't allow you to right-click and save images from the website because of copyright.. you can actually download the full-size images by looking at the Media tab in Page Info (I use Firefox).

Example: https://500px.com/photo/188772419/il-pane-di-natale-by-luiz-laercio?ctx_page=1&from=user&user_id=238340 Is: https://drscdn.500px.org/photo/188772419/q%3D80_m%3D2000_k%3D1/9026c5d135f99dd3233c7f6c1e30086f

It is not a .jpg extention, but if you copy and paste that link into the browser and right-click save, you can save it as a jpg.

Edit: Adult content my be different.. I don't know. You don't have to click "Show me the content" to see it. It still works the same if you view the Media for the link.

rautamiekka commented 7 years ago

EDIT: Added ssdeep (algo version 1.1 used by program version 2.13) and Whirlpool (program name hashdeep), both run on Ubuntu Linux 16.04 LTS x64.

I got the idea the last string in the DL link would be a hash, but I ran the following hashes on it (RapidCRC Unicode Portable) and nothing matches at all, perfect 0%:

CRC=DD1EF307
MD5=8ce49ee775d78c0c1692461ae37a309b
SHA1=faafd69862d47b1f427df171c0058d4547685a6a
SHA256=f2977b19685bb05c2c0094e6ee1cae6699f0745f09aae4b3bbf7ae378289b21c
SHA512=1d9b7cd83701feee038a484ec661a8c9cd9f066848a42b59dac74a145ef1a35cbcd6f7a6752b2354a546eb2b9f8ab3f9e15e9d0d23f33047e56b9d52214df9e2
SHA3-224=1c8b865a8ade244da1874d7510bed1a0e5745cfd7a9c2fb3854eea68
SHA3-256=5e36edfa92b7e2fa07253caa7a21b766b8754fe6f8de963ba64459a019c94b2c
SHA3-512=c4187692bdffa63808e4bfcb6f1ee500e132a4621d750b0883bdebe5565a848e2d7595b425f3f4161734350a6e8036e208e090c322305b7bdce6ecbda0c7be58
CRC32C=8CD624B8
ED2k=af89f51b6f491c97c9c6f50f387fe294
ssdeep=6144:x626LZc863GfvMFvL+bf9ZED0e1jPwcnoaBU4Z51PzRtZL+FuDCTguifA7oi3:QlL2863GfvIybf9ZEz1jPwwfBNZlmuDK
Whirlpool=340201,cde2a4932358d8d5a0366bbebc797c7049f662198252a385c70944ab2e9c9e688a69cd572669614622dd537de463d6369e89670d5e239279018426974aa742a3

So if it's a hash, it's not one I know, but the hash has same length as MD5.

metaprime commented 7 years ago

If you search in all issues and pull requests you'll see a lot of work previously went into making 500px work, but it seems they keep changing the website (or maybe even coming up with clever ways to block downloaders like ours from downloading the content).

It can probably be fixed but it will take some looking into.

ghost commented 7 years ago

Oh, I just noticed in the title bar for the link, it has the actual filename. untitled

metaprime commented 7 years ago

Side note: there are extensions like "right-to-click" which block the mechanism websites use to disable right-click.

metaprime commented 7 years ago

In Chrome it seems to show just the last segment of the URL in the title, but right-click and save gets the right filename. Not sure what's going on there...

ghost commented 7 years ago

It does for me in Firefox. Get the link, copy and paste it into the search bar and hit enter, and it shows for me.

rautamiekka commented 7 years ago

@tehloxely, that's exactly what he has in the screenshot. Firefox indeed shows the actual filename in the tab itself, but Chrome doesn't.

metaprime commented 7 years ago

@tehloxely yeah, I get that, I'm just not sure by what mechanism it is figuring out the title of the image. We'd need to know how to get that information from RipMe to take advantage of the information.

rautamiekka commented 7 years ago

I noticed that although the last segment of the address seems like a hash and has same length as an MD5 hash, none of the 12 hashes I tried match it at all.

metaprime commented 7 years ago

@rautamiekka If it is a hash as opposed to a randomly generated ID, you'd need to know exactly what blob they are hashing. Might be salted. I don't think we're going to get anywhere that way.

Also even if it is a hash, hashes are not reversible, so we can't use it to get the original blob that was hashed anyway.

ghost commented 7 years ago

Do you think /r/DataHoarder/ might be able to shed some light on it?

rautamiekka commented 7 years ago

@metaprime Well it was worth a try. If it actually was MD5 (or other hash of same output length) it'd help to verify the same file was downloaded.

I know hashes are one way, but I figured something has to be tried in case it works.

metaprime commented 7 years ago

@rautamiekka that's a good point -- I guess validation is nice but not something we especially need to care about, I think. Anyway I think we have bigger fish to fry than reverse engineering their image key.

metaprime commented 7 years ago

@tehloxely worth a try

ghost commented 7 years ago

I will ask, but an answer is not always guaranteed. Also ripping 500px isn't crucial, it would just be a nice addition to the app.

metaprime commented 7 years ago

@tehloxely considering it was supported in the past, I consider it a bug that needs fixing that it doesn't work right now. The only question is how much time I'll have to devote to it. As of now I'm not a user of 500px so I don't understand the site very well, but I should be able to figure it out regardless.

Could someone provide an example of a profile that contains some NSFW images? Apparently that was problematic as a special case before, and I want to make sure it remains supported.

cyian-1756 commented 7 years ago

At this point 500px works (tested using 1.4.6) and my pull downloads images without the water mark on them (Including adult images) so I think it's safe to close this

metaprime commented 7 years ago

@cyian-1756 agreed

4pr0n / ripme

500px Integration #396