b-fuze / deno-dom

Browser DOM & HTML parser in Deno
https://jsr.io/@b-fuze/deno-dom
MIT License
423 stars 48 forks source link

bug: querySelectorAll returns NodeListOf<Node> instead of NodeListOf<ELement> #4

Open sno2 opened 4 years ago

sno2 commented 4 years ago

Would it not be more ideal to have the querySelectorAll method return an ElementList instead of a NodeList? I think this because the querySelector method returns a single Element. You are able to do this in the web, so I think it should definitely be integrated into this module. Also, this would make it easier by being able to do the following:

const anchorTags = parseHTML!.querySelectorAll("a");

for (const anchorTag of anchorTags) {
  console.log(anchorTag.getAttribute("href"));
}

It would bring this module into the view of not only parsing and using the html, but adding ease to use it with web scraping. Great job on the module!

0kku commented 4 years ago

Sounds like what you're asking for is for the return type to be NodeListOf<HTMLAnchorElement>, like it is in dom.lib.d.ts. Current behavior seems like a bug.

sno2 commented 4 years ago

Yeah, sorry, I am trying to understand all of this ;)

0kku commented 4 years ago

This should be fairly easy to fix. I'll try to make the time for fixing this later today 👍

sno2 commented 4 years ago

Thank you very much! I just started working on a scraper with Deno and that really threw me off ;)

0kku commented 4 years ago

I guess I could give an update on this. This wasn't as easy as I initially thought. I've been working on it. The parser doesn't disambiguate between different element types, and so I must fix that first. Doing that is a much larger task, so you'll be waiting for my PR for a while longer.

jsejcksn commented 4 years ago

I think I have encountered this issue as well, while getting a list of Halloween emoji:

halloween-emoji.ts: ```ts import {DOMParser, Element, HTMLDocument} from 'https://deno.land/x/deno_dom@v0.1.3-alpha2/deno-dom-wasm.ts'; const fetchDoc = async (url: string) => { const response = await fetch(url); if (!response.ok) throw new Error('Response not OK'); const text = await response.text(); const doc = new DOMParser().parseFromString(text, 'text/html'); if (doc === null) throw new Error('Couldn\'t parse text as HTML'); return doc; }; const getEmojiList = (document: HTMLDocument) => { const origin = 'https://emojipedia.org'; const elements = [...document.querySelectorAll('body > div.container > div.content > ul > li')] as Element[]; const list = elements.map(element => { const anchor = element.querySelector(':scope > a'); // requires assertion on elements array // const url = anchor?.href; // unsupported? let url = anchor?.getAttribute('href') ?? undefined; if (url?.startsWith('/')) url = `${origin}${url}`; const [emoji, name] = [...(anchor?.childNodes ?? [])].map(node => node.textContent?.trim()) as (string | undefined)[]; return {emoji, name, url}; }); return list; }; const main = async () => { const url = new URL('https://emojipedia.org/halloween/'); const halloweenEmoji = getEmojiList(await fetchDoc(url.href)); console.log(halloweenEmoji); }; if (import.meta.main) main(); ```

Specifically:

// line 14
const elements = [...document.querySelectorAll('body > div.container > div.content > ul > li')] as Element[];

// line 17
const anchor = element.querySelector(':scope > a'); // requires assertion on elements array

If I don't include the assertion, I am presented with this error:

Property 'querySelector' does not exist on type 'Node'. ts(2339)


Also, it seems that the Element type doesn't provide JS property access to attributes, for example HTMLAnchorElement.href;

const url = anchor?.href;

Property 'href' does not exist on type 'Element'. ts(2339)

Edit: Here's a workaround: https://github.com/b-fuze/deno-dom/issues/72#issuecomment-1250615385

RodrigoTomeES commented 11 months ago

Hi, I am having the same issue trying to get all img tags and later reading its properties

oscarotero commented 11 months ago

I started to use the dom types that are available by default with TypeScript instead of the types provided by deno-dom and didn't have any issue yet: https://github.com/lumeland/lume/blob/main/core/utils/dom.ts#L28