alephdata / memorious

Lightweight web scraping toolkit for documents and structured data.
https://docs.alephdata.org/developers/memorious
MIT License
310 stars 59 forks source link

Extract: implement shell-style wildcards #198

Closed simonwoerpel closed 2 years ago

simonwoerpel commented 2 years ago

A small addition to the extract stage to be able to only store & emit extracted files following shell-style fname patterns

example config:

stage: extract
  method: extract
  params:
    wildcards: "*.json"
simonwoerpel commented 2 years ago

Yes, it looks good to me! One tiny change to consider: we can skip the fnmatch call altogether if there isn't a wildcard defined instead of trying match against *.

@sunu makes totally sense. will refine this