Local file with html links. Links not recognized

brttd / Multi-file-downloader

A Chrome extension which finds, and downloads all files linked on a page.

23 stars 6 forks source link

Local file with html links. Links not recognized #1

Open cponder opened 4 years ago

cponder commented 4 years ago

I have a local file with the contents (sorry about the formatting, the page is trying to interpret the HTML annotations):

\ \ \https://www.yahoo.com/ \\
\https://www.google.com/ \\
\ \

I can open the file in chrome, and the links all work if click on them. The downloader, however, tells me there are no links on the page. I believe that the downloader would work correctly if it were reading the same HMTL from a remote server. Is there some special setting I need to make in Chrome?

cponder commented 4 years ago

There have been discussions posted about remote HTML pages containing links to local files. Note that this is the opposite problem, and doesn't create the same security issues.

brttd commented 4 years ago

Hi, The downloader will only look for links which have some indication that they point to a file. Normal links without a file extension, or download attribute, will not be picked up. This is done deliberately, as the extension is meant for finding & downloading individual files, and not for downloading full web pages. The extension should behave the same if it's local files, or from a remote server (Although extensions by default don't have access to local user files)

cponder commented 4 years ago

These days I'm finding more & more content to be buried under JavaScript code, which makes it invisible to these kinds of download tools. If I download the web-pages, sometimes I can re-process them to figure out what's being pointed to. Unfortunately I'm stuck manually downloading these kinds of pages one-at-a-time because the dowloader won't fetch them in bulk from the parent page. So in my mind the page is the content that I'm trying to fetch. Could you provide an option for this?

cponder commented 4 years ago

Also, is this JavaScript always going to be a barrier, then? Or is there going to be some way to partially execute the code to figure out what's behind it? I remember one of the downloaders that would recognize when a file was actually a download-script, and fetch the file it referenced instead. Unfortunately I lost track of which extension it was when I set up my new system, and can't find anything that looks like it in the Chrome Extensions list.

joaociocca commented 4 years ago

@cponder I think what you need is wget - https://www.gnu.org/software/wget/

cponder commented 4 years ago

The "wget" won't work because the site expects some kind of authentication token. I can hold the top-level page open in one tab, and make a list of extracted links in another tab, and they work when I click on them. But that means I have to manually click on all of them to download the pages, since the downloader is disabled for non-file links.