Fix dataset.pushData() validation which would not allow other than plain objects.
Fix PuppeteerLaunchContext.stealth throwing when used in PuppeteerCrawler.
v1.0.0
After 3.5 years of rapid development, and a lot of breaking changes and deprecations, here comes the result - Apify SDK v1. There were two goals for this release. Stability and adding support for more browsers - Firefox and Webkit (Safari).
The SDK has grown quite popular over the years, powering thousands of web scraping and automation projects. We think our developers deserve a stable environment to work in and by releasing SDK v1, we commit to only make breaking changes once a year, with a new major release.
We added support for more browsers by replacing PuppeteerPool with browser-pool. A new library that we created specifically for this purpose. It builds on the ideas from PuppeteerPool and extends them to support Playwright. Playwright is a browser automation library similar to Puppeteer. It works with all well known browsers and uses almost the same interface as Puppeteer, while adding useful features and simplifying common tasks. Don't worry, you can still use Puppeteer with the new BrowserPool.
A large breaking change is that neither puppeteer nor playwright are bundled with the SDK v1. To make the choice of a library easier and installs faster, users will have to install the selected modules and versions themselves. This allows us to add support for even more libraries in the future.
Thanks to the addition of Playwright we now have a PlaywrightCrawler. It is very similar to PuppeteerCrawler and you can pick the one you prefer. It also means we needed to make some interface changes. The launchPuppeteerFunction option of PuppeteerCrawler is gone and launchPuppeteerOptions were replaced by launchContext. We also moved things around in the handlePageFunction arguments. See the migration guide for more detailed explanation and migration examples.
What's in store for SDK v2? We want to split the SDK into smaller libraries, so that everyone can install only the things they need. We plan a TypeScript migration to make crawler development faster and safer. Finally, we will take a good look at the interface of the whole SDK and update it to improve the developer experience. Bug fixes and scraping features will of course keep landing in versions 1.X as well.
Full list of changes:
BREAKING: Removed puppeteer from dependencies. If you want to use Puppeteer, you must install it yourself.
BREAKING: Removed PuppeteerPool. Use browser-pool.
BREAKING: Removed PuppeteerCrawlerOptions.launchPuppeteerOptions. Use launchContext.
BREAKING: Removed PuppeteerCrawlerOptions.launchPuppeteerFunction. Use PuppeteerCrawlerOptions.preLaunchHooks and postLaunchHooks.
BREAKING: Removed args.autoscaledPool and args.puppeteerPool from handle(Page/Request)Function arguments. Use args.crawler.autoscaledPool and args.crawler.browserPool.
BREAKING: The useSessionPool and persistCookiesPerSession options of crawlers are now true by default. Explicitly set them to false to override the behavior.
BREAKING:Apify.launchPuppeteer() no longer accepts LaunchPuppeteerOptions. It now accepts PuppeteerLaunchContext.
New deprecations:
DEPRECATED:PuppeteerCrawlerOptions.gotoFunction. Use PuppeteerCrawlerOptions.preNavigationHooks and postNavigationHooks.
Removals of earlier deprecated functions:
BREAKING: Removed Apify.utils.puppeteer.enqueueLinks(). Deprecated in 01/2019. Use Apify.utils.enqueueLinks().
BREAKING: Removed autoscaledPool.(set|get)MaxConcurrency(). Deprecated in 2019. Use autoscaledPool.maxConcurrency.
BREAKING: Removed CheerioCrawlerOptions.requestOptions. Deprecated in 03/2020. Use CheerioCrawlerOptions.prepareRequestFunction.
BREAKING: Removed Launch.requestOptions. Deprecated in 03/2020. Use CheerioCrawlerOptions.prepareRequestFunction.
New features:
Added Apify.PlaywrightCrawler which is almost identical to PuppeteerCrawler, but it crawls with the playwright library.
After 3.5 years of rapid development and a lot of breaking changes and deprecations,
here comes the result - Apify SDK v1. There were two goals for this release. Stability
and adding support for more browsers - Firefox and Webkit (Safari).
The SDK has grown quite popular over the years, powering thousands of web scraping
and automation projects. We think our developers deserve a stable environment to work
in and by releasing SDK v1, we commit to only make breaking changes once a year,
with a new major release.
We added support for more browsers by replacing PuppeteerPool with
browser-pool. A new library that we created
specifically for this purpose. It builds on the ideas from PuppeteerPool and extends
them to support Playwright. Playwright is
a browser automation library similar to Puppeteer. It works with all well known browsers
and uses almost the same interface as Puppeteer, while adding useful features and simplifying
common tasks. Don't worry, you can still use Puppeteer with the new BrowserPool.
A large breaking change is that neither puppeteer nor playwright are bundled with
the SDK v1. To make the choice of a library easier and installs faster, users will
have to install the selected modules and versions themselves. This allows us to add
support for even more libraries in the future.
Thanks to the addition of Playwright we now have a PlaywrightCrawler. It is very similar
to PuppeteerCrawler and you can pick the one you prefer. It also means we needed to make
some interface changes. The launchPuppeteerFunction option of PuppeteerCrawler is gone
and launchPuppeteerOptions were replaced by launchContext. We also moved things around
in the handlePageFunction arguments. See the
migration guide
for more detailed explanation and migration examples.
What's in store for SDK v2? We want to split the SDK into smaller libraries,
so that everyone can install only the things they need. We plan a TypeScript migration
to make crawler development faster and safer. Finally, we will take a good look
at the interface of the whole SDK and update it to improve the developer experience.
Bug fixes and scraping features will of course keep landing in versions 1.X as well.
Full list of changes:
BREAKING: Removed puppeteer from dependencies. If you want to use Puppeteer,
you must install it yourself.
BREAKING: Removed PuppeteerPool. Use browser-pool.
BREAKING: Removed PuppeteerCrawlerOptions.launchPuppeteerOptions. Use launchContext.
BREAKING: Removed PuppeteerCrawlerOptions.launchPuppeteerFunction.
Use PuppeteerCrawlerOptions.preLaunchHooks and postLaunchHooks.
BREAKING: Removed args.autoscaledPool and args.puppeteerPool from handle(Page/Request)Function
arguments. Use args.crawler.autoscaledPool and args.crawler.browserPool.
BREAKING: The useSessionPool and persistCookiesPerSession options of crawlers
are now true by default. Explicitly set them to false to override the behavior.
BREAKING:Apify.launchPuppeteer() no longer accepts LaunchPuppeteerOptions.
... (truncated)
Commits
c08cbbb Fix missing type definition in PuppeteerCrawler
4f98af2 Fix stealth in PuppeteerCrawler and the tests (#921)
7aee31d Make object validation less strict in pushData (#917)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme
Additionally, you can set the following in the `.dependabot/config.yml` file in this repo:
- Update frequency
- Out-of-range updates (receive only lockfile updates, if desired)
- Security updates (receive only security updates, if desired)
Bumps apify from 0.22.4 to 1.0.1.
Release notes
Sourced from apify's releases.
Changelog
Sourced from apify's changelog.
... (truncated)
Commits
c08cbbb
Fix missing type definition in PuppeteerCrawler4f98af2
Fix stealth in PuppeteerCrawler and the tests (#921)7aee31d
Make object validation less strict in pushData (#917)08792da
Increase test timeoutd7db549
Remove deleted docs from v1.0.00540808
Update front page example [skip ci]ceac9b7
Update og image, title and docs index [skip ci]d022dec
Update docusaurus, changelog and versione8c70cf
Build docs v1.0.0d6798df
Do not log stack trace of internal errorDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in the `.dependabot/config.yml` file in this repo: - Update frequency - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired)