Need to be able to support Xpath scraping. beautiful soup can be used as the main dependency.
Following inputs to be expected from the config:
Address of the page
Xpath to find the article items
Xpath to find the title of the article (relative to the article item)
Xpath to find the content of the article (relative to the article item)
Xpath to find the URL of the article (relative to the article item)
Debug mode (if true, provide detailed outputs in the logs of all items scraped and what title, content, and url were found in them, otherwise, provide only basic logs. )
Need to be able to support Xpath scraping. beautiful soup can be used as the main dependency. Following inputs to be expected from the config:
Feed_data is outputted to pass validation.py