IonicaBizau / scrape-it

🔮 A Node.js scraper for humans.
http://ionicabizau.net/blog/30-how-to-write-a-web-scraper-in-node-js
MIT License
4.02k stars 220 forks source link

Type error with the how field of type ScrapeOptionElement #193

Closed RepolloDev closed 1 month ago

RepolloDev commented 4 months ago

A little context

I recently started using scrape-it to extract data from html pages, but there were cases where I needed to intervene in the extraction using the Cheerio API.

// test example
const anyHTML = '<html>...</html>'
const { data } = scrapeIt.scrapeHTML<{ data: unknown }>(anyHTML, {
  data: {
    listItem: 'main',
    data: {
      items: {
        selector: 'article',
        how: (element) => {
           const $items = element.find('p:nth-child(n+2)')
           // more cheerio methods
           return $items.text()
        }
      }
    }
  }
})

The problem

TypeScript throws a typing warning, if you run the code nothing happens, but it becomes a nuisance to have that warning and not have autocompletion with the Cheerio object passed to the function parameter. image

Solución

Looking into the types of scrape-it, the how field has as its type a function whose parameter is a cheerio.Selector, which may cause the problem.

export interface ScrapeOptionElement {
        selector?: string;
        convert?: (value: any) => any;
        // Change cheerio.Selector to cheerio.Cheerio
        how?: string | ((element: cheerio.Selector) => any);
        attr?: string;
        trim?: boolean;
        closest?: string;
        eq?: number;
        texteq?: number;
    }
IonicaBizau commented 2 months ago

I am not good in handling TypeScript types, but contributions are welcome in that regard! Thank you very much!

fadingNA commented 2 months ago

@IonicaBizau can I work on this issue?

IonicaBizau commented 2 months ago

Yes, that would be great! :)