Open masteraaran opened 7 years ago
A much better option would be to integrate JavaScript into RipMe.
Perhaps so. Right now Ripme uses Jsoup, which is used for page parsing, but it does not support javascript. Selenium's 'HTMLUnitDriver' does support javascript, but it was throwing errors and I couldn't figure them out. I'm doing the best I can with what I've got...
Maybe they could be fixed together.
Is there any way to get the list of files directly via the json (or otherwise) endpoints that the page downloads to populate the page dynamically?
I agree that adding Selenium and PhantomJS to ripme is overkill. Volafile uses WebSockets as a data exchange. WS will be a challenge in itself to implement, but this approach would be lightweight in comparison.
The below WS address should be established:
wss://volafile.io/api/?rn=xxxxxxxxxx&EIO=3&transport=websocket&t=xxxxxxx
There are four query parameters in the exchange to establish a connection to the WS:
/[A-Za-z0-9]/
. Probably to prevent caching or bots.Once the WS session is established, we need to make a request to the WS which will require the following (again, as far as i can tell looking pretty quick):
https://volafile.io/static/js/main.js?c=<CHECKSUM #2>
, in the JS file there is a {config.checksum="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
value that will need to be parsed as well.[A]
window.config={
file_max_size: 21474840000,
max_room_name_length: 27,
chat_max_alias_length: 12,
chat_max_message_length: 300,
chat_max_history: 300,
file_time_to_live: 172800,
session_lifetime: 604800,
download_cookie_lifetime: 259200,
round_up_threshold: 0.2,
max_concurrent_uploads: 1,
ui_tooltip_show_delay: 200,
ui_enable_gallery: true,
disabled: false,
private: true,
name: "xxxxxxxx",
owner: "xxxxxxxx",
motd: "xxxxxxxx",
adult: false,
checksum2: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
room_id: "xxxxxx",
title_append: " - Volafile.io ",
domain: "volafile.io",
cdn_domain: "volafile.net"
}
Alright, so for the past two days I've been basically teaching myself java specifically for the purpose of implementing Volafile.io ripping capability to Ripme.
The problem with enabling Volafile ripping is that it uses javascript to dynamically populate the data in the file column. This means that Ripme's built-in HTTP grabber doesn't work for volafile, because it can't see the javascript generated content.
So, I implemented Selenium and PhantomJS, to open the page, wait until the javascript has loaded the content, then output the page source to a variable and parse it as normal.
I now have it up and working, grabbing all the links and downloading them successfully. Here's the rub: Because this method requires PhantomJS, all of the users will require PhantomJS to be on their computers in the 'standard' location (Usually on the C drive, in Program Files).
I'm ready to put this up on my fork and submit a pull request, but how can I package PhantomJS with the project so that people don't have to follow a tutorial to get PhantomJS 'installed' and available to be used? Is there a way to have Java download and unpack the PhantomJS Executable in a location relative to the ripme.jar?