Closed sunu closed 4 years ago
... to make sure metadata gets passed to the next stage along with the fetched content.
Here's a crawler config to test run the example in docs (https://memorious.readthedocs.io/en/latest/buildingcrawler.html#parse)
name: glitch_parse description: Parse metadata test pipeline: init: method: seed params: urls: - https://uncovered-calico-random.glitch.me/ handle: pass: fetch fetch: method: fetch handle: pass: parse parse: method: parse params: store: mime_group: documents include_paths: - './/article' meta: creator: './/article/p[@class="author"]' title: './/h1' meta_date: published_at: './/article/time' updated_at: './/article//span[@id="updated"]' handle: fetch: fetch store: store store: method: inspect
Side note: the test for parse doesn't really cover the case we're dealing with in this PR. Wonder if it's better to spin a local server to test against.
... to make sure metadata gets passed to the next stage along with the fetched content.
Here's a crawler config to test run the example in docs (https://memorious.readthedocs.io/en/latest/buildingcrawler.html#parse)