4pr0n / ripme

Downloads albums in bulk
MIT License
915 stars 204 forks source link

Have Volafile Ripping implemented, question about dependencies... #448

Open masteraaran opened 7 years ago

masteraaran commented 7 years ago

Alright, so for the past two days I've been basically teaching myself java specifically for the purpose of implementing Volafile.io ripping capability to Ripme.

The problem with enabling Volafile ripping is that it uses javascript to dynamically populate the data in the file column. This means that Ripme's built-in HTTP grabber doesn't work for volafile, because it can't see the javascript generated content.

So, I implemented Selenium and PhantomJS, to open the page, wait until the javascript has loaded the content, then output the page source to a variable and parse it as normal.

I now have it up and working, grabbing all the links and downloading them successfully. Here's the rub: Because this method requires PhantomJS, all of the users will require PhantomJS to be on their computers in the 'standard' location (Usually on the C drive, in Program Files).

I'm ready to put this up on my fork and submit a pull request, but how can I package PhantomJS with the project so that people don't have to follow a tutorial to get PhantomJS 'installed' and available to be used? Is there a way to have Java download and unpack the PhantomJS Executable in a location relative to the ripme.jar?

rautamiekka commented 7 years ago

A much better option would be to integrate JavaScript into RipMe.

masteraaran commented 7 years ago

Perhaps so. Right now Ripme uses Jsoup, which is used for page parsing, but it does not support javascript. Selenium's 'HTMLUnitDriver' does support javascript, but it was throwing errors and I couldn't figure them out. I'm doing the best I can with what I've got...

rautamiekka commented 7 years ago

Maybe they could be fixed together.

metaprime commented 7 years ago

Is there any way to get the list of files directly via the json (or otherwise) endpoints that the page downloads to populate the page dynamically?

ghost commented 7 years ago

I agree that adding Selenium and PhantomJS to ripme is overkill. Volafile uses WebSockets as a data exchange. WS will be a challenge in itself to implement, but this approach would be lightweight in comparison.

The below WS address should be established:

wss://volafile.io/api/?rn=xxxxxxxxxx&EIO=3&transport=websocket&t=xxxxxxx

There are four query parameters in the exchange to establish a connection to the WS:

Once the WS session is established, we need to make a request to the WS which will require the following (again, as far as i can tell looking pretty quick):

[A]

window.config={
            file_max_size: 21474840000,
            max_room_name_length: 27,
            chat_max_alias_length: 12,
            chat_max_message_length: 300,
            chat_max_history: 300,
            file_time_to_live: 172800,
            session_lifetime: 604800,
            download_cookie_lifetime: 259200,
            round_up_threshold: 0.2,
            max_concurrent_uploads: 1,
            ui_tooltip_show_delay: 200,
            ui_enable_gallery: true,
            disabled: false,
            private: true,
            name: "xxxxxxxx",
            owner: "xxxxxxxx",
            motd: "xxxxxxxx",
            adult: false,
        checksum2: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
        room_id: "xxxxxx", 
        title_append: " - Volafile.io ",
        domain: "volafile.io",
        cdn_domain: "volafile.net"
    }