azaslavsky / domJSON

Convert DOM trees into compact JSON objects, and vice versa, as fast as possible.
http://azaslavsky.github.io/domJSON/
Other
126 stars 43 forks source link

Usage w/ Jsdom: TypeError: Cannot read property 'href' of undefined #26

Closed batjko closed 6 years ago

batjko commented 6 years ago

e.g.:

const dom = await JSDOM.fromURL('http://...')
const parsed = toJSON(dom.window.document) // domjson.toJSON()

Error:

TypeError: Cannot read property 'href' of undefined
    at D:\git\food-search\node_modules\domjson\dist\domJSON.js:19:28
    at domJSON (D:\path\to\my\project\node_modules\domjson\dist\domJSON.js:7:23)
    at Object.<anonymous> (D:\path\to\my\project\node_modules\domjson\dist\domJSON.js:15:3)
    at Module._compile (module.js:641:30)
    at Module._extensions..js (module.js:652:10)
    at Object.require.extensions.(anonymous function) [as .js] 
azaslavsky commented 6 years ago

Can you provide the smallest subset of the HTML code that causes this error? Also, information on the environment in which this occurred (browser, os, incognito or not, etc) would be very helpful.

batjko commented 6 years ago

I'm afraid I can't, as the page I am scraping via JSDOM.fromURL is part of a proprietary project I have no direct access to, and I do not know which part of that HTML is causing the error.

But it has a bunch of Javascript in it, which I have to evaluate using JSDom's runScripts: 'dangerously' option of the fromURL() call.

It's also got a few hidden iframes and jQuery usage, some shadow dom stuff, as far as I can see. Not sure if any of this could be an issue?

azaslavsky commented 6 years ago

Hmmm, it will be really hard to help without more information. Can you at least tell me what type of node (tagName, attribute, etc) is the last to be visited before the error is thrown?

batjko commented 6 years ago

The error occurs during the parsing, i.e. at the snippet I shared above. I'm not sure how to find out which node the parser complains about?

The error points to the domJSON file, line 19: image ...so for some reason the win that gets passed in during the run of domJSON, doesn't have a location property.

azaslavsky commented 6 years ago

Ok, that's a much more solvable issue. What btowser/os/environment is this occurring in?

batjko commented 6 years ago

Windows 10, 64bit. Browser is jsdom, as mentioned.

azaslavsky commented 6 years ago

Ah, you're using jsDOM, I missed that. I'm not very familiar with that library, but here is my suspicion: almost all browsers (at least all of the IE8+ ones) set this to equal window in the global scope, which is what this line that bootstraps the domJSON relies on. The existence of this library implies that jsDom does not do that. I will make a small change to that setup (this => this || window) which will hopefully fix things for you, but you may need to use jsdom-global as well to get things to work properly.

azaslavsky commented 6 years ago

Ok, attempted a fix with this commit. Can you please try it and let me know if it works?

batjko commented 6 years ago

I'm afraid I'm still getting the same error with that commit:

image

image

azaslavsky commented 6 years ago

Just to be clear, you are getting that error even when using jsdom-global?

azaslavsky commented 6 years ago

Also worth reading: https://github.com/tmpvar/jsdom/issues/1388 and https://github.com/tmpvar/jsdom#simple-options.

My feeling is that this is an issue with jsDOM not implementing a (very) standard portion of the DOM spec more than it is a bug with this library per se. Also, its worth noting that jsDOM is not one of the (psuedo-)browsers listed as being supported in the README...

That being said, I do recognize that this library should be usable in browsers that almost but not quite simulate the real thing, so if there's an easy fix, I'm happy to put it in. I just want to make sure we exhaust all of the options that involve configuring or extending jsDOM itself first.

batjko commented 6 years ago

My feeling is that this is an issue with jsDOM

I guess that's quite possible. I'll give the options a few more tries. Based on other issues and stackoverflow, I had tried these, which were suppose to be the common way of getting around similar problems and worked for others:

JSDOM.reconfigureWindow(window, { url: 'http://localhost' })
JSDOM.reconfigureWindow(window, { location: 'http://localhost' })
JSDOM.reconfigureWindow(global.window, { url: 'http://localhost' })
window.location.href = 'http://localhost'
location.href = 'http://localhost'

...but strangely, setting href via the reconfigureWindow option was never among them. I'll try that as well and see what sticks.

Since the issue does seem to originate with jsdom's output, I'll close this off. Thanks for looking into it, though! 👍