RafaelVidaurre / yakuza

Highly scalable Node.js scraping framework for mobsters
298 stars 29 forks source link

Conditional flow control on job / routine #54

Closed raabbajam closed 8 years ago

raabbajam commented 8 years ago

My use case is along these, sequentially:

Login > get data A > get data B > get data C > logout

If get data A failed, is there any way to do conditional flow / skip get data B and get data C but always logout?

Another use case is, finding data on paginated pages. If found then go to next execution, else repeat current task with incremented params.

RafaelVidaurre commented 8 years ago

Yes there is, you can achieve this with the condition inside the builder function.

So:

Imagine dataA fails. You can do the following:

.builder(function (job) {
  // This will return either true or false
  return job.shared('dataA.succeeded');
});

When builders return a falsy value, the task being started is skipped.

Hope this helps!

raabbajam commented 8 years ago

Yes, that does it.

But what if a more complex process is needed? I think it will be a lot nicer to have a complex conditional flow.

Top of my hat, something like promise conditional structure.

// current API
Yakuza.agent('scraper', 'techinasia')
  .routine('getAccountData', [ 
    'login', 
    'getAccountData', 
    'logout' 
  ]);

// simple promise-like API
Yakuza.agent('scraper', 'techinasia')
  .routine('getAccountData', function(task){
    return task('login')
      .then('searchQuery')
      .then('processQuery')
      .then('logout');
  });

// complex conditional
Yakuza.agent('scraper', 'techinasia')
  .routine('getAccountData', function(task){
    return task('login')
      .then('searchQuery')
      .then(function (searchQueryResult) {
        if (searchQueryResult) {
          return task('processQueryStep1')
            .then('processQueryStep2')
            .then('processQueryStep3')
            .then(function (processQueryStep3Result) {
              if (processQueryStep3Result === 1) {
                return task('processQueryStep4a');
              } else if (processQueryStep3Result === 2) {
                return task('processQueryStep4b');
              }
              return task('processQueryStep4c');
            })
        }
        // return undefined
      })
      .done('logout'); // always called
  });

Instead of an array of string, it is a series of functions chained with then, fail, done that will return string.

Each processQueryStep is a separate request process and not just data formatting process, so I think they deserve their own execution block / task (as opposed to current API that can only do it all in one processQuery task.

What do you think?

RafaelVidaurre commented 8 years ago

I think more complex conditionals to handle execution of tasks makes perfect sense. I don't think though a promise based, API is the ideal solution though, we can make this simpler.

I'll create a generic feature request for this

RafaelVidaurre commented 8 years ago

see #55