4pr0n / ripme

Downloads albums in bulk
MIT License
918 stars 204 forks source link

Chevereto Image Hosting Script Support #433

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi Guys,

It would be awesome if RipMe could fully support image hosting websites which utilize Chevereto (www.chevereto.com) for their hosting software/script.

Chevereto currently supports JPG, PNG, BMP & GIF

Chevereto does provide a list of all direct image links contained in an album via the "embed codes" tab when viewing a album, however viewing the embed codes it typically limited to logged in users.

As such, it would likely require RipMe to support the ability to authenticate with a website powered by Chevereto in order to grab the links via that method.

An alternative method would be to view the page with all the images listed and then alter the URL grabbing the original media as opposed to the thumbnail.

Example:

Image: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.png Thumbnail: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.th.png Medium Thumbnail: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.md.png

When viewing the content you would typically see the medium size by default, however by changing the URL slightly should be able to grab the actual full size media.

Desired Support:

Example:

Album Page: https://www.ezphotoshare.com/album/TDta User's Albums Page: https://www.ezphotoshare.com/doreentokamumzeo/albums

I do own Chevereto and have several sites running, happy to help anyway I am able.

Thanks!

cyian-1756 commented 7 years ago

Taking a quick look at the links you provided ripping the sites should be easy (all the images have their own class and link) and finding that it is a chevereto site should be easily (the line appears in them)

However atm ripme uses the domain name to tell if it can rip a site, I could bypass this by making a request to the url when the user enters it but I feel that this is poor behavior (No data should be sent or received until the user clicks rip)

A work around might be to use a list of site names now to use chevereto (like ChanRipper.java does) and I feel that this is the best way to do it (for now)

ghost commented 7 years ago

Sounds good!

It would be great if we can start by adding hushpix.com (This would assist in my archiving project allowing users to grab content before I have time to create archives).

cyian-1756 commented 7 years ago

It seems that Chevereto is forcing redirects on it's image urls (Aside from thumbnails) meaning that when ripme downloads a picture names whatever.png, it really downloads an html page named whatever.png

I'm looking for a work around

cyian-1756 commented 7 years ago

It seems that the site is doing some JavaScript stuff to prevent downloading the pictures automatically and has IP banned me, I doubt this is getting added to ripme anytime soon. But I'll get back to working on it once the ip has ended

ghost commented 7 years ago

Thank you for taking the time to investigate!

IP Bans - I am not seeing any showing up, are you able to access the site at this time? Javascript - I am "guessing" that maybe the "Consent Screen" was causing the issue, it has been disabled.

cyian-1756 commented 7 years ago

IP Bans - I am not seeing any showing up, are you able to access the site at this time?

I can access hushpix.com no problem but I can only access ezphotoshare with JavaScript on from chrome not with firefox or ripme

cyian-1756 commented 7 years ago

The ripper seems to work on hushpix could you link me to/make an album on there fore me to test?

ghost commented 7 years ago

Certainly!

User Album Page (List's all the user's public albums) - http://hushpix.com/RedditGWGirls/albums Album Page - http://hushpix.com/album/7BTZ

On the EzPhotoShare front, I am able to access the website without issue using FF, Chrome & Microsoft Edge.

cyian-1756 commented 7 years ago

On the EzPhotoShare front, I am able to access the website without issue using FF, Chrome & Microsoft Edge.

That's odd, I still can't access it with ripme for some reason

However I have managed to rip from hushpix.com! EzPhotoShare however still won't let me download the images

Give me a bit to polish up my code (it's hacky as all hell atm) and I'll push it to my fork

ghost commented 7 years ago

Excellent to hear!

Once you have it good to go with HushPix I can begin troubleshooting further on EzPhotoShare to see what is causing the failures.

With your implementation will it be possible to point RipMe to the users album page and have it download all the albums to separate directories (This would be amazing!)?

Thank you for all of your hard work!

cyian-1756 commented 7 years ago

With your implementation will it be possible to point RipMe to the users album page and have it download all the albums to separate directories

Yea it should be possible but it will take a bit of work (I won't get it done today)

Thank you for all of your hard work!

No problem!

cyian-1756 commented 7 years ago

I've more or less gotten it working (the album naming is way to long still and you still have to download user albums one at a time) but besides that it now works with hushpix

However ezphotoshare still just downloads the image page

ghost commented 7 years ago

Awesome!

I have tracked down the issue with EzPhotoShare, unfortunately its not something that can be easily resolved at this moment, however it shouldn't effect any other Chevereto powered site.

cyian-1756 commented 7 years ago

@ihadp Great! I'll keep working on the ripper and it should end up in the main repo around February

cyian-1756 commented 7 years ago

@ihadp Sorry it took so long (work and my other projects got in the way) but with my latest commit you can now rip userpages (but all the images one folder atm), I'll work on getting each album in it's own folder in the coming days but I'm not sure it will be possible (I'll read the wiki and see if it has any examples on how to do it) it should be possible, just a bit of a pain

ghost commented 7 years ago

Thank you very much!

metaprime commented 7 years ago

@ihadp Thanks for reporting and @cyian-1756 thanks for the work implementing.

ghost commented 7 years ago

Hi Guys,

Any updates on this front?

Thanks!

cyian-1756 commented 7 years ago

@ihadp I tired to implement 1 album per folder ripper and couldn't. I don't know java or ripmes api well enough to do it.

That being said every thing else is implemented and you could get 1 album per folder by using a simple wrapper script (if you want I can write one)

cyian-1756 commented 7 years ago

This is as close as I manged to get

package com.rarchives.ripme.ripper.rippers;

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import com.rarchives.ripme.ripper.AbstractHTMLRipper;
import com.rarchives.ripme.utils.Http;
import java.util.HashMap;
import java.util.Map;
import java.util.Iterator;

public class CheveretoRipper extends AbstractHTMLRipper {

    public static Map<String, List<String>> albumNameAndUrls = new HashMap<String, List<String>>();
    public static List<String> urlList = new ArrayList<String>();

    public CheveretoRipper(URL url) throws IOException {
    super(url);
    }

    public static List<String> explicit_domains_1 = Arrays.asList("www.ezphotoshare.com", "hushpix.com");
        @Override
        public String getHost() {
            String host = url.toExternalForm();
            return host;
        }

        @Override
        public String getDomain() {
            String host = url.toExternalForm();
            return host;
        }

        @Override
        public boolean canRip(URL url) {
            String url_name = url.toExternalForm();
            if (explicit_domains_1.contains(url_name.split("/")[2]) == true) {
                return true;
            }
            return false;
        }

        @Override
        public String getGID(URL url) throws MalformedURLException {
            Pattern p = Pattern.compile("(?:https?://)?(?:www\\.)?[a-z1-9]*\\.[a-z1-9]*/album/([a-zA-Z1-9]*)/?$");
            Matcher m = p.matcher(url.toExternalForm());
            if (m.matches()) {
                return m.group(1);
            }
            else if (m.matches() == false) {
                Pattern pa = Pattern.compile("(?:https?://)?(?:www\\.)?[a-z1-9]*\\.[a-z1-9]*/([a-zA-Z1-9_-]*)/albums/?$");
                Matcher ma = pa.matcher(url.toExternalForm());
                if (ma.matches()) {
                    return ma.group(1);
                }
            }
            throw new MalformedURLException("Expected chevereto URL format: " +
                            "site.domain/album/albumName or site.domain/username/albums- got " + url + " instead");
        }

        @Override
        public Document getFirstPage() throws IOException {
            // "url" is an instance field of the superclass
            return Http.url(url).get();
        }
        @Override
        public Document getNextPage(Document doc) throws IOException {
            // Find next page
            String nextUrl = "";
            Element elem = doc.select("li.pagination-next > a").first();
                String nextPage = elem.attr("href");
                if (nextUrl == "") {
                    throw new IOException("No more pages");
                }
                // Sleep for half a sec to avoid getting IP banned
                sleep(500);
                return Http.url(nextUrl).get();
            }

        @Override
        public List<String> getURLsFromPage(Document doc) {
            List<String> result = new ArrayList<String>();
            Document userpage_doc;
            // We check for the following string to see if this is a user page or not
            if (doc.toString().contains("content=\"gallery\"")) {
                for (Element elem : doc.select("a.image-container")) {
                    String link = elem.attr("href");
                    logger.info("Grabbing album " + link);
                    try {
                        userpage_doc = Http.url(link).get();
                    } catch(IOException e){
                        logger.warn("Failed to log link in Jsoup");
                        userpage_doc = null;
                        e.printStackTrace();
                    }
                    for (Element element : userpage_doc.select("a.image-container > img")) {
                            String imageSource = element.attr("src");
                            logger.info("Found image " + link);
                            // We remove the .md from images so we download the full size image
                            // not the medium ones
                            imageSource = imageSource.replace(".md", "");
                            result.add(imageSource);
                            urlList.add(imageSource);
                        }
                    for (Element albumNameDoc : userpage_doc.select("meta[property=og:url]")) {
                        String albumName = albumNameDoc.attr("content");
                        albumName = albumName.split("/")[4];
                        albumNameAndUrls.put(albumName, urlList);
                    }
                }
            }
            else {
                for (Element el : doc.select("a.image-container > img")) {
                    String imageSource = el.attr("src");
                    // We remove the .md from images so we download the full size image
                    // not the medium ones
                    imageSource = imageSource.replace(".md", "");
                    result.add(imageSource);
                }
            }
            return result;
        }

        public URL convertUrl(String url) throws MalformedURLException {
            URL urlToDownload;
            try {
                urlToDownload = new URL(url);
            } catch(MalformedURLException e){
                logger.warn("Failed to convert url");
                urlToDownload = null;
                e.printStackTrace();
            }

                return urlToDownload;
        }

        @Override
        public void downloadURL(URL url, int index) {
            logger.info(url);
            for (Map.Entry<String, List<String>> entry : albumNameAndUrls.entrySet()) {
                String key = entry.getKey();
                List<String> values = entry.getValue();
                for (String urlToConvert : values) {
                    try {
                        logger.info("Downloading " + urlToConvert);
                        logger.info(convertUrl(urlToConvert));
                        addURLToDownload(convertUrl(urlToConvert));
                    }
                    catch(MalformedURLException e){
                        e.printStackTrace();
                    }
                }
            }
        }
    }

It almost works but for some reason it fails to download anything.

The only out of the script is

Downloading https://i.hushpix.com/j4P1N.jpg
https://i.hushpix.com/j4P1N.jpg
url: https://i.hushpix.com/j4P1N.jpg, prefix: , subdirectory, referrer: null, cookies: null
Downloading https://i.hushpix.com/j4P1N.jpg to /home/USER/ripme/rips/https__hushpix.com_RedditGWGirls_albums_RedditGWGirls/j4P1N.jpg
[!] Skipping https://i.hushpix.com/j4P1N.jpg -- already attempted: ./rips/https__hushpix.com_RedditGWGirls_albums_RedditGWGirls/j4P1N.jpg

But confusingly it does sometimes download the image.

If anyone else wants to take a crack at it that would be great

metaprime commented 7 years ago

@cyian-1756 This was part of your PR before, right? In that case, are you planning to hold off on submitting the PR for this ripper now?

cyian-1756 commented 7 years ago

@metaprime

This was part of your PR before, right?

Kinda. This is a heavily modified/rewritten version of it

In that case, are you planning to hold off on submitting the PR for this ripper now?

I'm planning on making a few changes to the original ripper (naming out put ect), removing the broken user page downloading and making a pull for it

ghost commented 6 years ago

Hi Everyone,

Following up on this issue to see where we stand.

I launched a new Chevereto based website a few months back (https://gwarchives.com) and would love it users could easily download the albums of their choice (or entire account) using RipMe.

Let me know if I can be of any assistance.

cyian-1756 commented 6 years ago

@ihadp

I launched a new Chevereto based website a few months back (https://gwarchives.com)

I'll add it to the CheveretoRipper

and would love it users could easily download the albums of their choice (or entire account) using RipMe.

The CheveretoRipper has album support at this point, I still have to add account support

Also this is the old and unmaintained repo, the new repo is https://github.com/RipMeApp/ripme