The PlayWrightFetcher class that allows doing browser-based requests with Vanilla PlayWright, PlayWright with stealth mode made by me, Real browsers through CDP, and NSTBrowser's docker browserless!
Added the completely new find_all/find methods to find elements easily on the page with dark magic!
Added the methods filter and search to the Adaptors class for easier bulk operations on Adaptor object groups.
Added methods css_first and xpath_first methods for easier usage.
Added the new class type TextHandlers which is used for bulk operations on TextHandler objects like the Adaptors class.
Added generate_full_css_selector and generate_full_xpath_selector methods.
Bugs Squashed
Now the Adaptors class version of re_first returns the first result that matches in all Adaptor objects inside instead of the faulty logic of returning the results of re_first of all Adaptor objects.
Now if the user selects a text-type content to be returned from selected elements (like css ::text function) with any method like .css or .xpath. The Adaptor object will return the TextHandlers class instead of returning a list of strings like before. So now you can do page.css('something::text').re_first(r'regex_pattern').json() instead of page.css('something::text')[0].re_first(r'regex_pattern').json()
Now Adaptor/Adaptors re/re_first arguments are consistent with the TextHandler ones. So now you have clean_match and case_sensitive arguments.
Now the auto_match argument is enabled by default in the initialization of Adaptor but still you have to enable it while selecting elements if you want to enable it. (Not a bug but a design decision)
A lot of type-annotations corrections here and there for better auto-completion experience while you are coding with Scrapling.
Quality of life changes
Renamed both css_selector and xpath_selector methods to generate_css_selector and generate_xpath_selector for clarity and to not interrupt the auto-completion while coding.
Restructured most of the old code into a core subpackage and other design decisions for cleaner and easier maintenance in the future.
Restructured the tests folder into a cleaner structure and added tests for the new features. Also now tox environments are cached on GitHub for faster automated tests with each commit.
What's changed
New features
Fetchers
feature with 3 new main types to make Scrapling fetch pages for you with a LOT of options!Fetcher
class for basic HTTP requestsStealthyFetcher
class is a completely stealthy fetcher that uses a stealthy modified version of Firefox.PlayWrightFetcher
class that allows doing browser-based requests with Vanilla PlayWright, PlayWright with stealth mode made by me, Real browsers through CDP, and NSTBrowser's docker browserless!find_all
/find
methods to find elements easily on the page with dark magic!filter
andsearch
to theAdaptors
class for easier bulk operations onAdaptor
object groups.css_first
andxpath_first
methods for easier usage.TextHandlers
which is used for bulk operations onTextHandler
objects like theAdaptors
class.generate_full_css_selector
andgenerate_full_xpath_selector
methods.Bugs Squashed
Adaptors
class version ofre_first
returns the first result that matches in allAdaptor
objects inside instead of the faulty logic of returning the results ofre_first
of allAdaptor
objects.::text
function) with any method like.css
or.xpath
. TheAdaptor
object will return theTextHandlers
class instead of returning a list of strings like before. So now you can dopage.css('something::text').re_first(r'regex_pattern').json()
instead ofpage.css('something::text')[0].re_first(r'regex_pattern').json()
Adaptor
/Adaptors
re/re_first arguments are consistent with theTextHandler
ones. So now you haveclean_match
andcase_sensitive
arguments.auto_match
argument is enabled by default in the initialization ofAdaptor
but still you have to enable it while selecting elements if you want to enable it. (Not a bug but a design decision)Quality of life changes
css_selector
andxpath_selector
methods togenerate_css_selector
andgenerate_xpath_selector
for clarity and to not interrupt the auto-completion while coding.core
subpackage and other design decisions for cleaner and easier maintenance in the future.