New methods Step::keep(), Step::keepAs(), Step::keepFromInput() and Step::keepInputAs() as simpler alternatives for Step::addToResult(), Step::addLaterToResult() and Step::keepInputData() which are all deprecated now. The new keep methods add data to a keep array in IO objects. Not creating a Result object and potentially sharing the same Result object for a lot of child outputs, makes the new keep functionality less complex. No need for something like addLaterToResult(). Kept properties can also be used with useInputKey() which is pretty handy.
Another cool new feature are sub crawlers. Any step can now create a sub crawler to fill a property. Example: you have a page about an author with multiple links to detail pages about his books. You can select those links and let a sub crawler fill the author's books property with data from the book detail pages.
Further also introduce a new Step::outputType() method, that returns if a certain step yields outputs that are associate arrays (or objects), scalar values or potentially both (mixed). This helps reduce potential critical problems during a crawler run by validating before the run and throwing an exception (or log error messages).
New methods
Step::keep()
,Step::keepAs()
,Step::keepFromInput()
andStep::keepInputAs()
as simpler alternatives forStep::addToResult()
,Step::addLaterToResult()
andStep::keepInputData()
which are all deprecated now. The new keep methods add data to a keep array in IO objects. Not creating a Result object and potentially sharing the same Result object for a lot of child outputs, makes the new keep functionality less complex. No need for something likeaddLaterToResult()
. Kept properties can also be used withuseInputKey()
which is pretty handy.Another cool new feature are sub crawlers. Any step can now create a sub crawler to fill a property. Example: you have a page about an author with multiple links to detail pages about his books. You can select those links and let a sub crawler fill the author's
books
property with data from the book detail pages.Further also introduce a new
Step::outputType()
method, that returns if a certain step yields outputs that are associate arrays (or objects), scalar values or potentially both (mixed). This helps reduce potential critical problems during a crawler run by validating before the run and throwing an exception (or log error messages).