Support for XPath input module - Githubissues

RamanMalykhin / pyrsspipe

Simple and extendable CLI utility for building RSS feeds

Apache License 2.0

0 stars 0 forks source link

Support for XPath input module #6

Closed RamanMalykhin closed 1 month ago

RamanMalykhin commented 1 month ago

Need to be able to support Xpath scraping. beautiful soup can be used as the main dependency. Following inputs to be expected from the config:

Address of the page
Xpath to find the article items
Xpath to find the title of the article (relative to the article item)
Xpath to find the content of the article (relative to the article item)
Xpath to find the URL of the article (relative to the article item)
Debug mode (if true, provide detailed outputs in the logs of all items scraped and what title, content, and url were found in them, otherwise, provide only basic logs. )

Feed_data is outputted to pass validation.py

RamanMalykhin commented 1 month ago

Nope, not bsoup: no Xpath support. Use lxml instead.