Closed david-littlefield closed 5 years ago
I'm unsure swift is the best choice. If I were you, I would write a puppeteer or a webdriver/selenium script.
@captaindavepdx Any comment? :p
Hi @gildas-lormeau!
Sorry for the late reply, I've been immersed in JavaScript Core, and I lost track of the time, haha.
Comments: Thanks for the feedback, but I'm kind of stuck with Swift. Luckily, JavaScript Core can run pure JavaScript within Swift. It has limitations, but I've been working through them one at a time. If JavaScript Core didn't work, then I would've gone the Puppeteer route. =p
Update: Made some progress since then, but I'm still trying to resolve the broken layout on websites like airbnb.com - not sure what the actual problem is...
If you have additional suggestions, feedback, or direction, it'd be gladly welcomed! =]
Progress:
@captaindavepdx. Did you try to inject this list of scripts in JavaScript Core? https://github.com/gildas-lormeau/SingleFile/blob/862d72073d6f35f8ffb60ed8ccd815cf12f0384e/cli/back-ends/puppeteer.js#L31-L51
And then execute this code ? https://github.com/gildas-lormeau/SingleFile/blob/862d72073d6f35f8ffb60ed8ccd815cf12f0384e/cli/back-ends/puppeteer.js#L104-L122
Hi @gildas-lormeau!
Thanks for the suggestion, it helped me realize I needed to add a base url to the links in the css files.
I feel pretty good using JavaScript with Swift in a single js file, but I'm still figuring out how to do it with multiple files. I think I have to specify each function I want to use with a special Swift export function. There's not much documentation available on it, so I've been building the website downloader from scratch within a single js file.
By adding a base url to the href and src links in the css file, it's fixed a lot of missing font and icon issues. While it works great on websites like Instagram and Twitter, it doesn't do well on airbnb.com. It seems like there's many height and width values that are percentages, which cause oversized images.
I wrote a script that replaced percentages with "auto," but then new issues appeared. And, the layout is still broken on the airbnb website.
Is this something similar to anything you had to work through?
If I understand you correctly, you are re-implementing from scratch SingleFile. As you can imagine, this is a lot of work (approx. 10,000 lines of code) and I'm not sure this is the simplest thing to do. Unfortunately, I won't have the time to fix issues in your implementation.
Hi @gildas-lormeau!
Wow, I didn't realize there was that much code. I started building that project because it didn't look like SingleFile could be used in Swift. But, that process has helped me learn a lot about JavaScript. And, based on your latest suggestion, combined with my improved understanding of JavaScript, integrating SingleFile into Swift seems promising!
I've been immersed in another project for the past several days, but that should finish very soon. I'll be jumping right back into SingleFile and JavaScript Core afterward!
Thanks for the feedback @gildas-lormeau!
Hi @gildas-lormeau!
I just finished the project I was working on - it took a lot longer than I anticipated. But, I'm back on the SingleFile and JavaScript Core quest! I'm going to try your recent suggestion, and I'll let you know how it goes. Thanks again!
You're welcome @captaindavepdx . Feel free to ask questions here if necessary ;)
Awesome! Ok, it seems like Swift loads the js files as if it pasted the contents of each file into the browser using developer tools console.
Currently, I'm getting the following error:
Promise {status: "rejected", result: ReferenceError: Can't find variable: singlefile} = $1
From exploring potential causes, I'm wondering:
Right now, I'm researching how do to that.
Does that sound about right, or would you suggest something else?
I think you may need to replace all this.singlefile..
occurrences with window.singlefile...
. This is what I need to do when I run the code via WebDriver in gecko-based browsers. Here is the code that does the replace: https://github.com/gildas-lormeau/SingleFile/blob/862d72073d6f35f8ffb60ed8ccd815cf12f0384e/cli/back-ends/webdriver-gecko.js#L100.
This is the interesting part .replace(/\n(this)\.([^ ]+) = (this)\.([^ ]+) \|\|/g, "\nwindow.$2 = window.$4 ||"))
Woah, that's cool! Ok, so I replaced all occurrences of "this.singlefile" with "window.singlefile"
It still displays an error:
Promise {status: "rejected", result: ReferenceError: Can't find variable: singlefile} = $2
I was browsing through the js files, I'm pretty new to javascript, and I couldn't tell where "singlefile" was initially declared.
The singlefile object is created in the index.js file in the root folder.
You can also try to replace in this file this.singlefile = this.singlefile || {
with var singlefile = {
Ok, did that. It makes sense that it should work now, but it still has the same error. For debugging, I started incrementally loading each script into the browser. That's when I noticed there were local folder references in the js files.
scriptElement.src = browser.runtime.getURL("/lib/hooks/content/content-hooks-frames-web.js");
The Swift app loads the contents of the js files, but its not referencing from the actual js files. Do I need to include the full local folder references in the js files?
scriptElement.src = browser.runtime.getURL("/Users/davidlittlefield/Desktop/SingleFile/lib/hooks/content/content-hooks-frames-web.js");
Or load those js files into the Swift app as well?
If possible, it'd be awesome to save all the needed js files in the app, so the app can run SingleFile without needing to download anything from npm.
Indeed, I forgot to mention you have to implement the singlefile.lib.getFileContent
method to circumvent your issue. This method is called twice and should return the content of the /lib/hooks/content/content-hooks-web.js
and /lib/hooks/content/content-hooks-frames-web.js
files. It means that if you run the following code after injecting SingleFile code, it should work (dump the code of the 2 files in the returned string).
singlefile.lib.getFileContent = function() {
return `
// dump here the content of /lib/hooks/content/content-hooks-web.js
` // dump here the content of /lib/hooks/content/content-hooks-frames-web.js
`;
}
Edit: Maybe this procedure is not simplest one, does browser.runtime.getURL
exist in your environment?
FYI, here is how I've implemented this function to run SingleFile with Puppeteer
I'm using the latest version of Chrome. I didn't see that as an option in the console. Chrome docs reference it as chrome.runtime.getURL()
, but it doesn't work when I type it into the console.
https://developer.chrome.com/extensions/runtime#method-getURL
In your puppeteer file, is that a dictionary where the keys are the file path, and the values are the contents of the js file?
I ended up manually putting all the js files into a single js file, typing the whole path for both of those functions:
if (this.browser && browser.runtime && browser.runtime.getURL) {
scriptElement.src = browser.runtime.getURL("/Users/davidlittlefield/Desktop/SingleFile" + "/lib/hooks/content/content-hooks-frames-web.js");
scriptElement.async = false;
} else if (this.singlefile.lib.getFileContent) {
scriptElement.textContent = this.singlefile.lib.getFileContent("/Users/davidlittlefield/Desktop/SingleFile" + "/lib/hooks/content/content-hooks-frames-web.js");
}
Now, it can find the singlefile
variable. And a new error appears:
Unhandled Promise Rejection: ReferenceError: Can't find variable: options
Am I doing the those functions wrong? Some of your code is more advanced than I've worked with, so I only kind of understand whats happening.
You're making progress but the code you posted won't work as you expect. The problem is that I suppose the condition this.browser && browser.runtime && browser.runtime.getURL
is never true
in your environment so the code in the first block of the if
will never be executed.
singlefile.lib.getContent
is called here in the code of the SingleFile:
singlefile.lib.getContent
should return the code (as a string) that corresponds to the path given as parameter. That's why I use a dictionary in the puppeteer implementation. However, you can also simply concatenate the 2 scripts (/lib/hooks/content/content-hooks-web.js
and /lib/hooks/content/content-hooks-frames-web.js
) and return the whole string without taking the path
parameter into account, as I suggested. This should also work because there won't be any clash between the 2 scripts.
Regarding the options
error, you have indeed to declare it and assign it to an object (i.e. const options = {}
) before running the code I pasted here https://github.com/gildas-lormeau/SingleFile/issues/231#issuecomment-506988406.
Ok, I tried to use singlefile.lib.getContent
but it doesn't appear to be an option? Maybe I forgot to load one of the js files? Which js file was getContent
from?
The method singlefile.lib.getContent
does not exist by default, you have to define it and inject it.
Ok, cool.
I never wrote swift code in my life. I guess it would look like this.
let hooksFrameURL = dir.appendingPathComponent("/Users/davidlittlefield/Desktop/SingleFile/lib/hooks/content/content-hooks-frames-web.js")
let hooksURL = dir.appendingPathComponent("/Users/davidlittlefield/Desktop/SingleFile/lib/hooks/content/content-hooksweb.js")
let textHooksFrame = ""
let textHooks = ""
do {
textHooksFrame = try String(contentsOf: hooksFrameURL , encoding: .utf8)
textHooks = try String(contentsOf: hooksURL , encoding: .utf8)
}
catch {/* error handling here */}
let script = "singlefile.lib.getFileContent () => " + textHooksFrame + textHooks
// inject the script into JavaScript core
I used this code as example https://stackoverflow.com/questions/24097826/read-and-write-a-string-from-text-file.
Haha, that's pretty good Swift! I was looking into how to load the contents of js files in JavaScript, which several StackOverflow posts said it wasn't possible in the browser for security reasons.
All of the SingleFile js files are in one js file that is stored in my app. That js file is separate from the Swift files. So, I can load the entire contents of that file into the webView
at launch, and then inject js into the webView
from Swift afterward.
But, the challenge seems to be injecting the contents of those files from Swift at launch, assuming that SingleFile needs it at launch. Which I'm pretty sure is doable. I'd need to store the contents of that js file in the swift file as a string. Then, I could inject what we need using string interpolation.
When does SingleFile need the contents of those files?
Let me suggest you an acceptable alternative to the injection of the singlefile.lib.getContent
method.
scriptElement.textContent = `/ dump the content of content-hooks-frames-web.js here (multi-line is OK) /`;
and replace https://github.com/gildas-lormeau/SingleFile/blob/862d72073d6f35f8ffb60ed8ccd815cf12f0384e/lib/hooks/content/content-hooks.js#L31-L36 with:
scriptElement.textContent = `/ dump the content of content-hooks-web.js here /`;
I think this is the simplest way to solve this issue.
Awesome! I didn't realize I could literally paste the contents of the file as a multiline string in the js file, haha. That approach no longer needs the file path concatenated to it, right?
This is the main feature of the ` (backquote) delimiter and it's quite useful indeed.
With this approach, the issues related to singlefile.lib.getFileContent
will be fixed because SingleFile won't call it anymore. You will just have to define the options
object (i.e. const options = {};
) before launching SingleFile.
I don't know if you're using the following API, but I recommend you to inject scripts like the example below in order to inject SingleFile code in all frames and as soon as possible.
...
let scriptToInject = "..."
let contentController = WKUserContentController()
let userScript = WKUserScript(source: scriptToInject, injectionTime: WKUserScriptInjectionTime.atdocumentstart, forMainFrameOnly: false)
contentController.addUserScript(userScript)
...
I used this post as example https://medium.com/@DrawandCode/how-to-communicate-with-iframes-inside-webview-2c9c86436edb
Cool, I've been trying both ways. Right now, I'm still trying to make the changes form your last post. I know we added content-hooks-frames.js
and hard coded content-hooks-frames-web.js
but I don't remember loading content-hooks.js
And, I didn't see it referenced in the other js files from searching for it.
Do I need to add that with the list of js files? Also, does it matter if the script is injected at the document start or end?
Let me recap.
Declare singleFile
as a variable in the index.js
file cf. https://github.com/gildas-lormeau/SingleFile/issues/231#issuecomment-515151250
Apply the changes I explained here https://github.com/gildas-lormeau/SingleFile/issues/231#issuecomment-515235314
Inject these files in the main frame and all inner frames (forMainFrameOnly: false
) as soon as possible (injectionTime: WKUserScriptInjectionTime.atdocumentstart
)
index.js
lib/hooks/content/content-hooks.js
lib/hooks/content/content-hooks-frames.js
lib/frame-tree/content/content-frame-tree.js
lib/lazy/content/content-lazy-loader.js
lib/single-file/single-file-util.js
lib/single-file/single-file-helper.js
lib/single-file/vendor/css-tree.js
lib/single-file/vendor/html-srcset-parser.js
lib/single-file/vendor/css-minifier.js
lib/single-file/vendor/css-font-property-parser.js
lib/single-file/vendor/css-media-query-parser.js
lib/single-file/modules/html-minifier.js
lib/single-file/modules/css-fonts-minifier.js
lib/single-file/modules/css-fonts-alt-minifier.js
lib/single-file/modules/css-matched-rules.js
lib/single-file/modules/css-medias-alt-minifier.js
lib/single-file/modules/css-rules-minifier.js
lib/single-file/modules/html-images-alt-minifier.js
lib/single-file/modules/html-serializer.js
lib/single-file/single-file-core.js
lib/single-file/single-file.js
Inject this content in the main frame only (forMainFrameOnly: true
) when the main frame is loaded (injectionTime: WKUserScriptInjectionTime.atdocumentend
). You must not inject it into inner frames.
const options = {};
singlefile.lib.helper.initDoc(document);
options.insertSingleFileComment = true;
options.insertFaviconLink = true;
const preInitializationPromises = [];
if (!options.saveRawPage) {
if (!options.removeFrames) {
preInitializationPromises.push(singlefile.lib.frameTree.content.frames.getAsync(options));
}
if (options.loadDeferredImages) {
preInitializationPromises.push(singlefile.lib.lazy.content.loader.process(options));
}
}
[options.frames] = await Promise.all(preInitializationPromises);
options.doc = document;
options.win = window;
const SingleFile = singlefile.lib.SingleFile.getClass();
const singleFile = new SingleFile(options);
await singleFile.run();
return await singleFile.getPageData();
I think I've done everything except the second part of #231 (Comment).
I didn't see any reference to content-hooks.js
in the concatenated js files, so I'm not sure how to replace the getFileContents
in content-hooks.js
portion of the concatenated file.
Sorry, if I'm missing something obvious.
Okay, I did not understand that /lib/hooks/content/content-hooks.js
was missing in the list... You should inject it, it's a bug in my implementation(s). I'll fix that (https://github.com/gildas-lormeau/SingleFile/issues/247). I updated the previous post accordingly.
I updated again the "Recap" post https://github.com/gildas-lormeau/SingleFile/issues/231#issuecomment-515243538 to take into account the frames of the page (cf forMainFrameOnly
).
Cool, doing that now. The Swift function requires an injection time, atDocumentStart
or atDocumentEnd
, does that matter?
Yep, to get fonts on https://www.theverge.com/ for example.
Use atDocumentStart
for the list of files and atDocumentEnd
for the SingleFile script. I updated the post.
I didn't realize I needed to split those into two separate scripts - list of files and singlefile. Doing that now.
It's my fault, I was not clear enough. I highly recommend you to automate all these steps with a swift program. Thus, you'll be able to update the code easily in the future.
Good idea! Ok, so it started doing stuff, it went from 0 to 871, but crashed on type error:
Unhandled Promise Rejection: TypeError: Type error
What is this "871"? I would also need more details about the type error. Can't you have a stacktrace or a line number, or maybe attach a debugger to the webview?
"871" seemed like it was making progress. It started at 0 then increased incrementally to 465, and then incrementally to 871, then it stopped, and it displayed the error message.
It had a line number, but it pointed to the that line number in the SingleFileScript
I just made.
Looking into debug tools, I'm not as familiar because its JavaScript running out of the Safari app, instead of the normal Swift Xcode editor.
Okay, please let me know when you have more info. I updated again the procedure https://github.com/gildas-lormeau/SingleFile/issues/231#issuecomment-515235314. There are now blocks of lines to replace instead of single lines. Maybe this will help to fix your issue. Make sure you use backquotes to delimiter the dumped scripts.
Will do, thanks a lot @gildas-lormeau! You've been so incredibly helpful!
You're welcome. If it works and if you can automate this with a swift program, please consider open-sourcing your code. It will be very helpful for everyone too! :)
Absolutely!
I forgot to ask, the browser.runtime.getURL
didn't matter anymore, right? Because it returns false, so the else
statement gets called, which is where we added the contents of js file as a multiline string, right?
Doesn't matter? scriptElement.src = browser.runtime.getURL("/lib/hooks/content/content-hooks-web.js");
Because we added? scriptElement.textContent = ''contents of the js file..."
window.singlefile.lib.hooks.content.main = window.singlefile.lib.hooks.content.main || (() => {
if (document instanceof HTMLDocument) {
const scriptElement = document.createElement("script");
scriptElement.async = false;
if (this.browser && browser.runtime && browser.runtime.getURL) {
scriptElement.src = browser.runtime.getURL("/lib/hooks/content/content-hooks-web.js");
scriptElement.async = false;
} else if (this.singlefile.lib.getFileContent) {
scriptElement.textContent = contentHooksWeb;
}
(document.documentElement || document).appendChild(scriptElement);
scriptElement.remove();
}
return {};
})();
Ok, I redid the scripts from scratch using your instructions. The same error occurs, but I might have found the stack trace:
stack: "[native code]↵parseURL@user-script:1:895:20↵user-script:1:10936:29↵asyncFunctionResume@[native code]↵user-script:1:10913:6…"
What is the URL of the page you're using to test your program? You have also to make sure you execute SingleFile (last step) after navigating to this page.
"www.apple.com" I have it set to run from clicking a button after the page has loaded.
I also added console.log
to the resourceUrl
and baseURI
.
Here's the console.log
for just the resourceURL
That's weird... Could you please add console.log(document)
and console.log(document.baseURI)
for example at the top of the script you're running when you click on the button?
Hi @gildas-lormeau ,
Update: I forked SingleFile, but I couldn't figure out where to modify it. So, I built an html file downloader with user login based on SingleFile using Apple's WebKit framework. It worked great for Twitter and Facebook. But, there's a layout issue on sites like airbnb.com. The css seems to have loaded, but the layout is broken - repeated images, large height and width gaps, and missing characters (square box).
Also, I recently found Apple's JavaScript Core framework, which can load and use JavaScript libraries in Swift. So, I'm exploring if there's a way to use SingleFile with JavaScript Core.
Question: SingleFile downloads airbnb.com perfectly, so I was wondering if you knew offhand what could be causing the problem?
Any help, guidance, suggestions would be very much appreciated. =]