crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development
https://www.crwlr.software/packages/crawler
MIT License
312 stars 11 forks source link

New DomQuery::formattedText() method #130

Closed otsch closed 5 months ago

otsch commented 6 months ago

The DomQuery class (parent of CssSelector (Dom::cssSelector) and XPathQuery (Dom::xPath)) has a new method formattedText() that uses the new crwlr/html-2-text package to convert the HTML to formatted plain text. You can also provide a customized instance of the Html2Text class to the formattedText() method.

Also added a lot of missing throws doblock tags in the whole codebase and adapted PHP CS Fixer config a little.