andykais / scrape-pages

generalized scraper using a single instruction set for any site that can be statically scraped
https://scrape-pages.js.org
MIT License
6 stars 2 forks source link

implement regex cleanup support #13

Closed andykais closed 5 years ago

andykais commented 5 years ago

inputs, downloads, & parses all have an option for regexCleanup. This issue is to implement that feature.

type RegexRemove = string
type RegexReplace = {
  selector: string
  replacer: string
}
type RegexCleanup = RegexRemove | RegexReplace
define:
  api:
    download:
      urlTemplate: ...
      # find regex and replace with ''
      regexCleanup: "var json="

a string value becomes the long version: { selector: string, replacer: '' }