Closed david-littlefield closed 5 years ago
Thank you, I think this may be related to frames... To confirm this hypothesis, could you please add options.removeFrames = true
just after creating the options
object?
By the way, does the web inspector allow you to put breakpoints in the code?
Same issue with removeFrames
set to true
. I added console.log
to options
.
I'm not sure, but I don't think so. Swift has breakpoints built into the xcode editor. I set one up in the js file, but it didn't respond.
Does it work with https://example.com instead of https://www.apple.com?
Same issue on "example.com"
I just thought of something. I'm using WKWebKit which uses Safari. And if Puppeteer uses Chromium, could that be causing the problem?
If so, I'd need to have Swift run SingleFile using terminal to launch Chromium, right?
I don't think WebKit is the cause of the issue. If that was the case, I'm not sure you would be able to control Chrome with swift.
Can you please log resourceURL
and baseURI
like you did previously and post the logs here when testing example.com?
Honestly, I never used a Mac in my life so I cannot tell you if that's possible or not. All I can say is that in NodeJS, Puppeteer (or Webdriver) is required in order to control Chrome from the NodeJS program.
I only see one line of log in the screenshot. I was expecting two lines, one log for the resourceURL and one log for the baseURI. Did you log the 2 values?
Maybe you should write console.log("resourceURL", resourceURL);
and console.log("baseURI", baseURI);
to make logs easier to read.
Also, in order to help me understanding what's wrong, could you set DEBUG
to true
in single-file-core.js
at the line 27, do the same test (with example.com) and show the logs?
The baseURL
was the second log. But, you're right, difficult to read. By bad. I'll add the labels to help with readability. Making the debug changes now.
Finally, If you insert a debugger;
in the script of the last step, what happens?
I just changed debug
to true
in single-file-core.js
. Do I insert debugger
into the separate main
script - where we use console log
to print document
and baseURI
?
You can insert debugger;
at any place as long as you know the code where you insert it is executed.
Woah, that's cool. Looks like it's doing a breakpoint in Safari?
Yes, that's a good news! So instead of logging values, insert this code if (!baseURI) debugger;
and show me the call stack when the breakpoint is reached.
Do I need to set baseURI
? const baseURI = document.baseURI
? Or if (!document.baseURI) debugger;
? And where do you want it inserted?
I would like that you insert this code in the parseURL(resourceURL, baseURI)
function, aka before it crashes actually. I thought your previous logs were added there, weren't they?
Ahhhh, sorry. I got mixed up after we changed options. My bad, correcting it now.
I also realized a mistake I made yesterday. Fixed it. It didn't make a difference. same outcome.
Okay, if you have the "call stack" and "variables" panels in the debugger, please post a screenshot of them when you reach the breakpoint. Unfold scopes in the "variables" panel if possible.
Call Stack:
Is this the "variables" panel you're looking for? If so, it scrolls down quite a bit, and there are many scopes that can be unfolded.
K, doing that now.
I was wrong, please wait ;)
Put a debugger
at line #1309 of single-file-core.js
and log the options
object in the console please.
Ok, reversing change? Adding debugger
and console.log
to options
at #1309, instead.
Please forget (again) these instructions. Instead, replace in single-file-util.js
:
parseURL(resourceURL, baseURI) {
return new URL(resourceURL, baseURI);
},
with
parseURL(resourceURL, baseURI) {
if (baseURI) {
return new URL(resourceURL, baseURI);
} else {
return new URL(resourceURL);
}
},
K, reversing, and will then replace as instructed.
Thank you.
Happy to help!
Do I need to add another return or something at the end of parseURL? The editor is showing an error.
Do not forget the trailing comma. I edited the post accordingly.
Had the comma, I'll upload what it looks like, one sec.
You've forgotten the }
after the 2nd return
.
Hahah, whoops! Thanks
I think it worked! No crash! =] Now, what do I do? It said it's resolved, but how do I access the html?
I guess it's returned by the function used to inject the script.
But isn't it in a promise format, or something like that? How do I get the data inside?
Isn't it resolved? It's a JS object once resolved.
This code at the end your script should ensure the promise is resolved (via the await).
return await singleFile.getPageData();
Ok, how do I turn that js object into a string? Then I can export it to swift as a string, and then save that string as an html file in Swift.
It depends, you can use JSON.stringify()
to get all the data as a JSON string or get the HTML only via the content
property directly.
return await JSON.stringify(singleFile.getPageData());
or
return (await singleFile.getPageData()).content;
K, trying that now. I was going to do a straight convert to string, then save to html. Is there a better benefit to using json instead?
With JSON, you'll get the title of the page and the filename generated by the template (right now, you did not define any template).
FYI, I pushed the fix https://github.com/gildas-lormeau/SingleFile/commit/fe8f38a16920d4e6b691d0886643bbe033e2a0d9
Nice! Trying JSON stringify
.
FYI to pass a filename template to SingleFile, you have to define the property filenameTemplate
in the options
object. You can set it for example to "{page-title} ({date-iso} {time-locale}).html"
.
The return await JSON.stringify(singleFile.getPageData());
returned empty brackets.
The return (await singleFile.getPageData()).content;
was able to console.log
the html in Safari, but I couldn't pass it through to Swift - it said it was an unsupported type. Swift can only receive string I think.
But it's indeed a string...
Haha, weird. I'll work on that. Hey working on this with you was pretty awesome! I'm surprised because I thought we needed puppeteer, which I thought required Chrome? So, did we not use puppeteer?
Also, once it's working, I'm pretty confident I can automate it for the community. What would you like it to do?
The API you're using to control WebKit (https://developer.apple.com/documentation/webkit) replaces Puppeteer in your case. Puppeteer is a library developed by Google for controlling Chrome. It's also compatible (not 100%) with Firefox for some months. It's not compatible with Safari though.
Ideally, I would like that you produce a swift script that would be compatible to the CLI tool I provide. Otherwise, open-source what you want ;) (as long as you don't make money with this since it's under AGPL).
Hi @gildas-lormeau ,
Update: I forked SingleFile, but I couldn't figure out where to modify it. So, I built an html file downloader with user login based on SingleFile using Apple's WebKit framework. It worked great for Twitter and Facebook. But, there's a layout issue on sites like airbnb.com. The css seems to have loaded, but the layout is broken - repeated images, large height and width gaps, and missing characters (square box).
Also, I recently found Apple's JavaScript Core framework, which can load and use JavaScript libraries in Swift. So, I'm exploring if there's a way to use SingleFile with JavaScript Core.
Question: SingleFile downloads airbnb.com perfectly, so I was wondering if you knew offhand what could be causing the problem?
Any help, guidance, suggestions would be very much appreciated. =]