Closed skrat closed 1 year ago
@jeluard I made some comments on this after looking at the code. Any thoughts?
The wholeText
change looks a little suspicious. wholeText
seems more adapted in this case than nodeValue
and actually reflects the original clojure code. I would be interested in seeing the inconsistencies due to its usage.
Maybe nodeValue
could only be used for the nodejs implementation?
Also it looks like there are still some wholeText
usages after the change. Is that intended?
Overall the updated code looks a little harder to follow than before.
Maybe a different approach could be to polyfill the DOM API currently used on top of jsdom
? This way the implementation would be more agnostic (as opposed to jsdom
specific), assuming we are currently relying on the standard w3c DOM API.
Hm, yeah, I understand very little of what you just said. Can you shoot me an email at [my github username] at gmail?
Relevant question about text nodes http://stackoverflow.com/questions/12286975/when-working-with-text-nodes-should-i-use-the-data-nodevalue-textcontent
I haven't seen wholeText
used before, it looks like DOM Level 3 property. As said, it's usage didn't yield expected results, and I don't see this property to be working in both node.js and browser environment. It might work in clojure, but in clojurescript it doesn't.
As of line 62, I simply wanted the node-type
to be the same on both platforms. Alternatively you can provide note-type
definition in bot branches of the big (if
.
@jeluard I don't get the last comment either. More agnostic, but how exactly? By polyfilling, do you mean injecting jsdom constructors to the globals? That doesn't sound good to me.
The DOMParser-fromString
change IMO is fully backwards compatible, I've only seen it ((DOMParser.)
) used in public parse
function where I replaced it with single DOMParser-fromString
call, that either uses browsers (.parseFromString (js/DOMParser.) s)
or node/jsdom's function call. I agree that the big if
with def
s is not so nice, but works pretty well without any further ceremonies. Let me stress again that my point is not to solely make it work on node.js, I need it to work equally well in browser too, and I don't see any breaking changes, although I haven't used hickory
before, so it's hard for me to judge the wholeText
change. I just don't see this property working.
wholeText
definitively works with ClojureScript both under Chrome / Firefox and phantomJS / slimerJS. I use hickory
to parse HTML every day in browsers and never got an issue with wholeText
.
Can you detail what kind of issue you are talking about? Do you get unexpected parse results or do you see environment no implementing wholeText
? In the former case it would be great to have new tests highlighting the wrong behavior.
@jeluard I see it now, wholeText
seems to be present in the output of DOMParser
, but it isn't available on the DOM found directly in the browser. Eg. when you open any page in Chrome/FF, jump into console, and run document.querySelector('a').childNodes[0].wholeText
(assuming there's text node in that random anchor), you'll get undefined
.
Anyway, the point is that jsdom
doesn't have it, and for Text nodes, nodeValue
from DOM Level 1 should give you what you exepct: the contents of a text node.
Could you explain why you chose to use wholeText
instead of nodeValue
for Text nodes?
Regarding polyfilling
: current ClojureScript implementation follows the official w3c API as far as I can tell.
Usually in JS world (maybe more specifically browsers, I am not familiar with nodejs) when features are missing from a browser (pretty common) you polyfill
them. The idea is to implement the API as it should be using JavaScript.
In our case this would mean rely on jsdom
to implement what hickory currently use so that no change is needed to the ClojureScript implementation. This can be done directly in an extra JS file that would be provided to nodejs
or directly in core.cljs
.
I believe this approach could be superior because:
nodejs
ever get DOM support@skrat what do you think? Is it something you usually see in nodejs world?
Maybe nodeValue
could be used as a fallback when wholeText
does not exist?
Unfortunately jsdom
is the only viable option so far, the only one exposing (exporting) constructors for NodeList
, Node
and NamedNodeMap
. Another thing is that is uses AttributeList
for node.attributes
. We'd still have to extend that one, and only when using jsdom
.
So while the idea of polyfilling, and keeping the original code intact sounds (of course) very good, in practice, it's not so easy. It also is not a common pattern in node.js.
It is true that the parsing bit can be done by the client of hickory
, but making people extend types and all that is not so nice. Additionally I would also have to patch jsdom
to support wholeText
. IMO it's easier to stick with particular implementation.
For the rest, I would be really interested in the explanation of why you use wholeText
? To be honest I don't even get its existence. It says it represents the contents of this text node, and all adjacent ones. But in practice you don't have parsers that produce adjacent text nodes, or do you? Even if that'd be the case, then you would end up with that text occuring multiple times in the parsed structure, because for each text node, you would end up with text from all neighboring nodes.
IMO what you want is the text content of the node currently being parsed and nothing else, ie. what nodeValue
has.
wholeText
was part of the original clojure implementation and has been used in the ClojureScript one to mimic the original behavior. Probably we should stick with its usage for platform that provide it and have a fallback for those which don't. An option is to simply check for its existence and fallback to nodeValue
when needed.
If you have problematic tests with the current implementation it would be a great addition.
As for the other comment it's more about style, I personally don't care that much as long as hickory still works as it used to with browsers. No much style difference in this regard between Clojure / ClojureScript.
In any case it's a good thing to have hickory support on more platforms.
Ok, so is (or (aget this "wholeText") (aget this "nodeValue"))
acceptable in both as-hickory
and as-hiccup
? About the jsdom
/ polyfill
issue, is the current solution in this pull request acceptable or shall I change it somehow?
Yes (or (aget this "wholeText") (aget this "nodeValue"))
sounds better.
As for the change style, I am not quite sure how nested def
/ defn
are supposed to work. I remember some comments on Clojure / ClojureScript mailing list regarding that king of usage. It might work now but may break at some point.
Would there be a way to avoid those nested defs?
I think so, but that would mean doing the require
ing and if
s all at the runtime, every time you call to parse
, instead of just once during the load time, which is better IMO.
Maybe it would be worth starting a thread on ClojureScript mailing list to check what is the idiomatic way to handle this scenario?
I did so over here. Additionally I have to use this line to have it working with master
(js/Object.defineProperty (.-prototype Text) "wholeText"
#js {:get (fn [] (js* "this.nodeValue"))})
wholeText
may lead to duplicate text whenen used in clojurescript. Issue #34 provides a replicable case. Fixed in pull request #35.
Have you considered using nodeValue
as suggested in this thread? Also it would be great to align both clojure and clojurescript code. Do you mind testing the same change for clojure?
Using DOM's nodeValue
is a reasonable replacement for the incorrect wholeText
property. Another option (suggested in PR #35) is goog.dom/getRawTextContent
, taking advantage of the google closure dom library which is available for clojurescript both for browsers and nodejs. Google's implementation actually calls nodeValue
but using the abstraction could have forward compatibility benefits. Neither are available for jsoup used in the clojure implementation.
Closing, as #17 has been fixed by #33 and I see no value added by this change, not to speak of the massive merge conflicts that would need to be resolved to make this compatible with current hickory
Partially fixing #17 , I could only observe
parse-fragment
to not work as expected. Additionally replacingwholeText
acessor fornodeValue
as it's use didn't yield the correct text in both node.js and browser.