gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
15.8k stars 1.02k forks source link

Seeking help, guidance, suggestions from the wise, experienced, and powerful. =] #231

Closed david-littlefield closed 5 years ago

david-littlefield commented 5 years ago

Hi @gildas-lormeau ,

Update: I forked SingleFile, but I couldn't figure out where to modify it. So, I built an html file downloader with user login based on SingleFile using Apple's WebKit framework. It worked great for Twitter and Facebook. But, there's a layout issue on sites like airbnb.com. The css seems to have loaded, but the layout is broken - repeated images, large height and width gaps, and missing characters (square box).

Also, I recently found Apple's JavaScript Core framework, which can load and use JavaScript libraries in Swift. So, I'm exploring if there's a way to use SingleFile with JavaScript Core.

Question: SingleFile downloads airbnb.com perfectly, so I was wondering if you knew offhand what could be causing the problem?

Any help, guidance, suggestions would be very much appreciated. =]

gildas-lormeau commented 5 years ago

Thank you, I think this may be related to frames... To confirm this hypothesis, could you please add options.removeFrames = true just after creating the options object?

gildas-lormeau commented 5 years ago

By the way, does the web inspector allow you to put breakpoints in the code?

david-littlefield commented 5 years ago

Same issue with removeFrames set to true. I added console.log to options.

Screen Shot 2019-07-26 at 11 00 51 AM

I'm not sure, but I don't think so. Swift has breakpoints built into the xcode editor. I set one up in the js file, but it didn't respond.

Screen Shot 2019-07-26 at 10 56 05 AM
gildas-lormeau commented 5 years ago

Does it work with https://example.com instead of https://www.apple.com?

david-littlefield commented 5 years ago

Same issue on "example.com"

I just thought of something. I'm using WKWebKit which uses Safari. And if Puppeteer uses Chromium, could that be causing the problem?

If so, I'd need to have Swift run SingleFile using terminal to launch Chromium, right?

gildas-lormeau commented 5 years ago

I don't think WebKit is the cause of the issue. If that was the case, I'm not sure you would be able to control Chrome with swift.

gildas-lormeau commented 5 years ago

Can you please log resourceURL and baseURI like you did previously and post the logs here when testing example.com?

david-littlefield commented 5 years ago
Screen Shot 2019-07-26 at 11 18 01 AM
gildas-lormeau commented 5 years ago

Honestly, I never used a Mac in my life so I cannot tell you if that's possible or not. All I can say is that in NodeJS, Puppeteer (or Webdriver) is required in order to control Chrome from the NodeJS program.

I only see one line of log in the screenshot. I was expecting two lines, one log for the resourceURL and one log for the baseURI. Did you log the 2 values?

Maybe you should write console.log("resourceURL", resourceURL); and console.log("baseURI", baseURI); to make logs easier to read.

gildas-lormeau commented 5 years ago

Also, in order to help me understanding what's wrong, could you set DEBUG to true in single-file-core.js at the line 27, do the same test (with example.com) and show the logs?

david-littlefield commented 5 years ago

The baseURL was the second log. But, you're right, difficult to read. By bad. I'll add the labels to help with readability. Making the debug changes now.

gildas-lormeau commented 5 years ago

Finally, If you insert a debugger; in the script of the last step, what happens?

david-littlefield commented 5 years ago

I just changed debug to true in single-file-core.js. Do I insert debugger into the separate main script - where we use console log to print document and baseURI?

gildas-lormeau commented 5 years ago

You can insert debugger; at any place as long as you know the code where you insert it is executed.

david-littlefield commented 5 years ago

Woah, that's cool. Looks like it's doing a breakpoint in Safari?

gildas-lormeau commented 5 years ago

Yes, that's a good news! So instead of logging values, insert this code if (!baseURI) debugger; and show me the call stack when the breakpoint is reached.

david-littlefield commented 5 years ago

Do I need to set baseURI? const baseURI = document.baseURI? Or if (!document.baseURI) debugger;? And where do you want it inserted?

Screen Shot 2019-07-26 at 11 41 17 AM
gildas-lormeau commented 5 years ago

I would like that you insert this code in the parseURL(resourceURL, baseURI) function, aka before it crashes actually. I thought your previous logs were added there, weren't they?

david-littlefield commented 5 years ago

Ahhhh, sorry. I got mixed up after we changed options. My bad, correcting it now.

david-littlefield commented 5 years ago
Screen Shot 2019-07-26 at 11 54 37 AM Screen Shot 2019-07-26 at 11 56 53 AM
david-littlefield commented 5 years ago

I also realized a mistake I made yesterday. Fixed it. It didn't make a difference. same outcome.

gildas-lormeau commented 5 years ago

Okay, if you have the "call stack" and "variables" panels in the debugger, please post a screenshot of them when you reach the breakpoint. Unfold scopes in the "variables" panel if possible.

david-littlefield commented 5 years ago

Call Stack:

Screen Shot 2019-07-26 at 12 04 09 PM

Is this the "variables" panel you're looking for? If so, it scrolls down quite a bit, and there are many scopes that can be unfolded.

Screen Shot 2019-07-26 at 12 05 19 PM
david-littlefield commented 5 years ago

K, doing that now.

gildas-lormeau commented 5 years ago

I was wrong, please wait ;)

gildas-lormeau commented 5 years ago

Put a debugger at line #1309 of single-file-core.js and log the options object in the console please.

david-littlefield commented 5 years ago

Ok, reversing change? Adding debugger and console.log to options at #1309, instead.

gildas-lormeau commented 5 years ago

Please forget (again) these instructions. Instead, replace in single-file-util.js:

            parseURL(resourceURL, baseURI) {
                return new URL(resourceURL, baseURI);
            },

with

            parseURL(resourceURL, baseURI) {
                if (baseURI) {
                    return new URL(resourceURL, baseURI);
                } else {
                    return new URL(resourceURL);
                }
            },
david-littlefield commented 5 years ago

K, reversing, and will then replace as instructed.

gildas-lormeau commented 5 years ago

Thank you.

david-littlefield commented 5 years ago

Happy to help!

david-littlefield commented 5 years ago

Do I need to add another return or something at the end of parseURL? The editor is showing an error.

gildas-lormeau commented 5 years ago

Do not forget the trailing comma. I edited the post accordingly.

david-littlefield commented 5 years ago

Had the comma, I'll upload what it looks like, one sec.

Screen Shot 2019-07-26 at 12 27 29 PM
gildas-lormeau commented 5 years ago

You've forgotten the } after the 2nd return.

david-littlefield commented 5 years ago

Hahah, whoops! Thanks

david-littlefield commented 5 years ago

I think it worked! No crash! =] Now, what do I do? It said it's resolved, but how do I access the html?

gildas-lormeau commented 5 years ago

I guess it's returned by the function used to inject the script.

david-littlefield commented 5 years ago

But isn't it in a promise format, or something like that? How do I get the data inside?

gildas-lormeau commented 5 years ago

Isn't it resolved? It's a JS object once resolved.

This code at the end your script should ensure the promise is resolved (via the await).

return await singleFile.getPageData();

david-littlefield commented 5 years ago

Ok, how do I turn that js object into a string? Then I can export it to swift as a string, and then save that string as an html file in Swift.

gildas-lormeau commented 5 years ago

It depends, you can use JSON.stringify() to get all the data as a JSON string or get the HTML only via the content property directly.

return await JSON.stringify(singleFile.getPageData()); or return (await singleFile.getPageData()).content;

david-littlefield commented 5 years ago

K, trying that now. I was going to do a straight convert to string, then save to html. Is there a better benefit to using json instead?

gildas-lormeau commented 5 years ago

With JSON, you'll get the title of the page and the filename generated by the template (right now, you did not define any template).

FYI, I pushed the fix https://github.com/gildas-lormeau/SingleFile/commit/fe8f38a16920d4e6b691d0886643bbe033e2a0d9

david-littlefield commented 5 years ago

Nice! Trying JSON stringify.

gildas-lormeau commented 5 years ago

FYI to pass a filename template to SingleFile, you have to define the property filenameTemplate in the options object. You can set it for example to "{page-title} ({date-iso} {time-locale}).html".

david-littlefield commented 5 years ago

The return await JSON.stringify(singleFile.getPageData()); returned empty brackets.

The return (await singleFile.getPageData()).content; was able to console.log the html in Safari, but I couldn't pass it through to Swift - it said it was an unsupported type. Swift can only receive string I think.

gildas-lormeau commented 5 years ago

But it's indeed a string...

david-littlefield commented 5 years ago

Haha, weird. I'll work on that. Hey working on this with you was pretty awesome! I'm surprised because I thought we needed puppeteer, which I thought required Chrome? So, did we not use puppeteer?

Also, once it's working, I'm pretty confident I can automate it for the community. What would you like it to do?

gildas-lormeau commented 5 years ago

The API you're using to control WebKit (https://developer.apple.com/documentation/webkit) replaces Puppeteer in your case. Puppeteer is a library developed by Google for controlling Chrome. It's also compatible (not 100%) with Firefox for some months. It's not compatible with Safari though.

Ideally, I would like that you produce a swift script that would be compatible to the CLI tool I provide. Otherwise, open-source what you want ;) (as long as you don't make money with this since it's under AGPL).