codeigniter4 / tasks

Task Scheduler for CodeIgnter 4
https://tasks.codeigniter.com/
MIT License
95 stars 21 forks source link

Running multiple tasks asynchronously #38

Open najdanovicivan opened 2 years ago

najdanovicivan commented 2 years ago

I take a look at what been going on here and i wonder if one thing is possible with this.

I work on a project with CI4 which relays heavily on cron to fetch the data from APIs. We're fetching data form about 30 endpoints every minute. And each of the request takes a lot of time to complete the processing. So in other to achieve those I need to spawn a lot of processes to work at the same time. So I have a single command which executed the multiple instances of other command by using something like this

/**
 * Spark Exec
 *
 * @param string   $sparkCommand Command
 * @param int|null $timeout      Timeout
 */
function spark_exec(string $sparkCommand, ?int $timeout = null): void
{
    $command = '';

    if ($timeout) {
        $command .= 'timeout ' . $timeout . ' ';
    }

    $command .= '"' . PHP_BINARY . '" -c "' . (get_cfg_var('cfg_file_path') ?: PHP_CONFIG_FILE_PATH) . '" ' . ROOTPATH . 'spark ' . $sparkCommand . ' > /dev/null &';

    exec($command);
}

And the problem is that I cannot have more that one process working working with the same endpoint and writing the same data db so I have mechanism to put some locks in place so I use files to track the locking

/**
     * Crates the Lockfile with the current class name
     *
     * @param string|null $identifier String to append to the filename
     *
     * @return bool|resource Locked file if successful otherwise false
     */
    public static function lock(?string $identifier = null)
    {
        //Add leading dash to identifier if it is set
        if (isset($identifier)) {
            $identifier = '-' . $identifier;
        }

        //Get the locks file directory
        $lockDir = WRITEPATH . 'cron/locks/';

        //Create the locks file directory if it does not exist
        if (! file_exists($lockDir)) {
            mkdir($lockDir, 0755, true);
        }

        //Get the filename
        $filename = $lockDir . $identifier . '.lock';

        //Open the file for writing to lock it
        $lock = fopen($filename, 'wb');

        if ($lock === false) {
            // Unable to open file, check that you have permissions to open/write
            CLI::error('Unable to write the lock file');

            exit;
        }

        if (flock($lock, LOCK_EX | LOCK_NB) === false) {
            // Lock is already in use by another process
            CLI::error('Another instance is already running');

            exit;
        }

        //Return the locked file
        return $lock;
    }

    /**
     * Closes the file removing the lock
     *
     * @param resource $lock Lock
     */
    public static function unlock(&$lock): void
    {
        //Check if lock exist
        if ($lock && get_resource_type($lock) !== 'Unknown') {
            //Get the filename
            $metaData = stream_get_meta_data($lock);
            $filename = $metaData['uri'];

            //Close the file to remove the lock
            fclose($lock);

            //Remove the lock file
            unlink($filename);
        }
    }

    /**
     * Check if lock files with set prefix exist
     *
     * @param string $prefix filename prefix
     *
     * @return bool Ture if there are lock files otherwise false
     */
    public static function isWorking(string $prefix): bool
    {
        $result = false;

        //We read the locks directory if there are  lock files it means we're still working
        if ($openDir = opendir(WRITEPATH . 'cron/locks/')) {
            while (($file = readdir($openDir)) !== false) {
                if (substr($file, 0, strlen($prefix)) === $prefix) {
                    $result = true;
                    break;
                }
            }
        }

        return $result;
    }

I wonder if there are plans to be able to achieve something similar with the scheduler here. From looking at the code I believe all the scheduled actions run on a single thread. So for example if I have 2 tasks scheduled to run first runs every 5 minutes second task runs every minute. And if first task takes more that 1 minute to be completed. On the next cron run second task will be started but once the first process finishes with task 1 it will continue with the second task in the first process and the order of execution is completely broken.

And there are a lot of real case situations where long running tasks are needed. One example is generating sitemap from the database for the huge site

lonnieezell commented 2 years ago

That wasn't in the immediate timeline but we might be open to that. At the current moment, though you can get around it by specifying the time it should be run, and ensuring none of those run at the same minute (so not one every 5 minutes and one on 10 since they would overlap).

If setup as recommended, then the OS's cron system runs the script every minute - which means a new process is used. If there's nothing it's immediately freed. If a task takes longer than a minute then a new process would be started, which should handle most of what you need. Obviously the locks are not taken care of in this case, though. And that would still have to be managed manually.

Rather than managing separate threads/locks, though, it is likely simpler to add a feature to ensure one task finishes before another starts, and it could delay execution until the previous one was executed. Or chaining tasks?

// pre-requisite
$schedule->command('foo')->every('Monday')->named('foo-task');
$schedule->command('bar')->every('Monday')->after('foo-task');

// chaining idea
$schedule->command('foo')->every('Monday')->then()->command('bar');

Would one of those situations satisfy what you need?

najdanovicivan commented 2 years ago

No. It's completely opposite of what I need I have to run 30 tasks which can last from couple of seconds to couple of minutes.

All of those currently use same command but with different parameters. Multiple instances of same command with different parameters can be executed and run at the same time but only one instance can run with same parameters so it will have to be defined somewhat like this

$schedule->command('foo --param1 a')->every('Minute')->singleton()->async();
$schedule->command('foo --param1 b')->every('Minute')->singleton()->async();
$schedule->command('foo --param1 c')->every('Minute')->singleton()->async();

Each of this should be spawned as separate process.

So the cron runs every minute and at first run all command are executed and are running simultaneously.

For example:

first run: foo --param1 a -> will complete in 25 seconds foo --param1 b -> will complete in 90 seconds foo --param1 c -> will complete in 40 seconds

second run:
foo --param1 a -> will complete in 20 seconds foo --param1 b -> not executed as it's already running foo --param1 c -> will complete in 300 seconds

third run:
foo --param1 a -> will complete in 20 seconds foo --param1 b -> will complete in 40 seconds foo --param1 c -> not executed as it's already running ....

najdanovicivan commented 2 years ago

Locks are far safer but instead of locks it can do grep from ps aux to check for running process but it's not platform agnostic solution and will need a lot code to handle all the platform.

Even the complete locking is not that much of an issue as it can be done in the command which will do early return if there is already a process running

The thing I need for this to support is the async fire and forget kind of execution which will just spawn another process in the background.

Locks are nice to have but if those are done in scheduler they'll do nothing to prevent the command from being run manually so it might be a better option to add native locks support on the command level instead.

najdanovicivan commented 2 years ago

Async should also be added in the runShell

The async is achieved by forwarding output of the exec to > /dev/null &

MGatner commented 2 years ago

@najdanovicivan You bring up some crucial features, but in my opinion this is the domain of Queues rather than Tasks. A Queue would handle the distribution and execution of commands whereas Tasks handles the scheduling and definition of code.

It was always our intention to release this library alongside the framework's (upcoming) Queue class; your case would be a good guinea pig for trying those things together.

webkp commented 2 years ago

I have exactly the same need as @najdanovicivan and came across this library while searching for a solution.

It was always our intention to release this library alongside the framework's (upcoming) Queue class

@MGatner: I wanted to ask if there is any news regarding the release of Queue class in CI4.x? For CI2 I solved it with Gearman but for a new test project on CI4 I would like to solve it a bit more elegantly and a solution integrated into the framework/CI4 would come just at the right time.

MGatner commented 2 years ago

Queue is mostly all done but we have some integration decisions to make (there are a couple slightly-varying options) and then need to handle the actual merge and subsequent testing. The CI4 team is all pretty busy right now so it's slow going - anyone with queue and CodeIgniter experience would be most welcome to come help!

webkp commented 2 years ago

Thank you. Can you tell me where (repo/branch) I would find the code to see if I could be of any help?

MGatner commented 2 years ago

Sure! The original code is on the main repo in the feature/queue branch: https://github.com/codeigniter4/CodeIgniter4

If you look at the associated Issue in GitHub there is a lot of discussion, including a link to Cole's adaptation.