Queued job: Connection won't close

KarimGeiger commented 7 years ago

Laravel Version: 5.4.16
Adldap2-Laravel Version: v3.0.4
PHP Version: 7.1

Description:

When using this extension on a queued job in conjunction with the Adldap facade the connection will not be closed after the job finishes. This may have to do with the fact that we are connecting to two different AD servers.

We are currently experiencing the problem that if the AD server is shortly unavailable between two jobs and a connection was established before, ADLDAP will fail with the error message "Can't contact LDAP server".

This can be fixed by restarting the queued workers or by writing a destructor for the specific job in which the AD connection will be closed.

Example:

1:00 - Job1 runs, and connects to AD1 and AD2.
2:00 - AD2 becomes unavailable.
2:30 - AD2 is available again.
4:00 - Job2 runs, and connects to AD2. Job2 will now throw a "Can't contact LDAP server" BindException while trying to bind to the AD server. The job therefore fails.
4:30 - The worker gets restarted.
5:00 - Job2 is rescheduled and now works as expected.

Steps To Reproduce:

As I'm not able to force a downtime of our AD servers, I can't guarantee this to reproduce it, but it should:

Build a job that does something like

Adldap::connect('foo')->search()->where('sAMAccountName', '=', 'bar')->firstOrFail();

Run that job once using a worker.
Do not stop the worker, but wait until the job succeeds.
Disconnect and then reconnect the AD server (force the connection to break).
Run the job on the same worker instance once again. It will fail now.

stevebauman commented 7 years ago

Hi @KarimGeiger,

Unfortunately I've never tested stored queued jobs using Adldap2.

I'm surprised that the __destruct() isn't called on each connection after the job has finished:

https://github.com/Adldap2/Adldap2/blob/master/src/Connections/Provider.php#L55-L66

Are you able to suggest a fix?

KarimGeiger commented 7 years ago

Hi @stevebauman,

thanks for the response!

Yes, I saw that method and was surprised, too. It seems like Laravel doesn't destroy the Facade instances after completing a job if the worker continues to run. Sadly I'm not sure how to force this the "right way", since I'm not quite sure if there is a way for you to detect if the job is done at all. Maybe there is an event or something.

The other thing I noticed today was that it seems like Adldap2 doesn't recover if a connection gets destroyed. I tried unsetting all connection by calling this exact destruct method manually after finishing the job. The problem then was that, in the next job, it tried to use that destructed object as a resource for the connection:

ldap_error(): supplied resource is not a valid ldap link resource in vendor/adldap2/adldap2/src/Connections/Ldap.php:160

I could not investigate further, but I think we have two things to deal with:

check if there is a way to call the destructor after a job is done.
make sure Adldap won't pass an invalid resource once destructed and rather creates a new one.

As soon as I'm back in the office on Monday, I'll have a closer look.

stevebauman commented 7 years ago

Thanks for the detailed info!

I'll see if I can test this on my end and reproduce.

stevebauman commented 7 years ago

Also, try using the --daemon flag for the queue and see if you still receive this issue:

https://laravel.com/docs/5.1/queues#daemon-queue-listener

KarimGeiger commented 7 years ago

Yes, in Laravel 5.4 the daemon mode is default for the queue:work command. I think this causes the problem:

Daemon queue workers do not "reboot" the framework before processing each job. Therefore, you should free any heavy resources after each job completes. For example, if you are doing image manipulation with the GD library, you should free the memory with imagedestroy when you are done. - https://laravel.com/docs/5.4/queues#running-the-queue-worker

Of course I could just reboot the framework after each job, but this would slow down the queue and doesn't really fix the problem.

stevebauman commented 7 years ago

This is definitely hard to debug, I'm having trouble determining what is actually going on behind the scenes.

I'm able to reproduce the issue using your steps above, but even if I re-instantiate the connection manually every time a job is executed, it will still fail to reconnect after losing connectivity.

This seems like it may be a combination of the Laravel queue worker and PHP itself (but definitely not leaving Adldap2 out of the question at all).

Going to run raw LDAP methods inside a job and see if I encounter the same issue.

KarimGeiger commented 7 years ago

Thank you for looking into it. Glad you can reproduce it. I'm pretty sure we'll find a solution to the problem eventually.

KarimGeiger commented 7 years ago

So, according to the documentation, one can register events firing after a queued job is executed like this:

Queue::after(function (JobProcessed $event) {
    // ...
});

Maybe it's possible to completely destruct the facade (provider) there, so Adldap has to create a new connection upon next use.

AlexH-HankIT commented 6 years ago

@KarimGeiger I'm facing the same issues as you. Did you find a solution? Thanks.

KarimGeiger commented 6 years ago

I now use the destructor of the job class to close the connection using the close() method. Before first use, I manually connect to the server using Adldap::connect() to make sure I won't get a connection returned that has already been closed.

ndum commented 5 years ago

Hi, i have the same issue... (but with Laravel Horizon)

Unfortunately I can't find a method in the Adldap2 interface to close the connection. Is there really no easy way here to close the connection?

Thanks

gdesoto-lhs commented 5 years ago

I'm going to throw a "me too" on this issue.

Environment: Windows Server & IIS PHP 7.2.7 Adldap2/Adldap2-Laravel 6.0.8

I'm wondering if it has to do with the service provider being registered as a singleton? The queue worker process's connection to AD times out or disconnects and the next time the job runs Adldap thinks it's already connected and never attempts to reconnect. I couldn't figure out how to force a reconnect via Adldap facade. Apparently, there's an App::forgetInstance() method which I'm now attempting to call prior to accessing the Adldap facade from any jobs I need to pass to the queue.

use Adldap\Laravel\Facades\Adldap;
...
app()->forgetInstance('Adldap\AdldapInterface');
$ad = Adldap::class;

I'll report back with any results later.

marcotulioaguilar commented 5 years ago

Facing the same issue here.

Environment: CentOS7 PHP 7.3.9 Adldap2/Adldap2-Laravel 6.0.8

Jobs using Adldap only works after php artisan queue:restart

Anyone was able to solve this problem?

stevebauman commented 5 years ago

@ndum

Unfortunately I can't find a method in the Adldap2 interface to close the connection. Is there really no easy way here to close the connection?

Yes there is, on a provider instance simply call $provider->getConnection()->close().

@artisanaigg

Jobs using Adldap only works after php artisan queue:restart

I know for sure that for every queued job that requires a connection to your LDAP server, you will need to manually connect at the start of every job.

For instance, you cannot have one job that initializes the connection, then another that utilizes it. This is because if the initial job fails, the second will not have any connectivity. Can you post your job code with any sensitive details omitted?

I believe using php artisan queue:listen may be the resolution here - even though this may not be ideal, it will likely be the only way since the queue:work worker has an application state in memory. I will give this a shot tonight.

gdesoto-lhs commented 5 years ago

I'm still coming across the issue, but did just come across a realization. It hit me that I have a couple other jobs that use the Adldap facade without issue. They run on a similar schedule and do not lose the connection.

Looking into it, the only difference I found in the jobs that failed after a second run is that they make use of the paginate() method. The jobs that do not lose connection use either first() or find() to return a single record at a time. Any chance others are also seeing the same as me? I'll look to see if I can find anything that would cause paginating to kill the connection.

Another light-bulb thought that I just had is that I remember experiencing in previous versions that the jobs that use paginate would fetch updated data the first time, but then the next time they ran they didn't, and I would need to manually restart the queue worker to get it to load updated data again. I'm wondering if this is the same symptom manifesting itself differently.

marcotulioaguilar commented 5 years ago

I know for sure that for every queued job that requires a connection to your LDAP server, you will need to manually connect at the start of every job.

I tried today with $provider->connect(); and the result was the same. After manually restarts the queue worker to load the new code, it works like before, without connect(), then some time later starts to failing with "can't contact ldap server".

I believe using php artisan queue:listen may be the resolution here - even though this may not be ideal, it will likely be the only way since the queue:work worker has an application state in memory. I will give this a shot tonight.

I'll try with queue:listen tomorrow but I'm not sure how to make it works with Laravel Horizon and supervisor.

Job


        $provider = Adldap::getProvider('default');
        $provider->connect();
        $GroupHasPermissions = GroupHasPermission::all();

        foreach ($GroupHasPermissions as $GroupHasPermission)
        {
            $group = $provider->search()->groups()->findBy('cn',$GroupHasPermission->id_group);
            foreach ($group->getMembers() as $member) {
                $usernamead = $member->getUserPrincipalName();
                $user = User::where('username', $usernamead)->first();
                if($user){
                    if(!$user->hasPermissionTo($GroupHasPermission->id_permission))
                    {
                        $user->givePermissionTo($GroupHasPermission->id_permission);
                    }
                }
            }
        }

LDAP CONF

'connections' => [

        'default' => [

            'auto_connect' => env('LDAP_AUTO_CONNECT', true),

            'connection' => Adldap\Connections\Ldap::class,

            'settings' => [

                'schema' => Adldap\Schemas\ActiveDirectory::class,

                'account_prefix' => env('LDAP_ACCOUNT_PREFIX', ''),

                'account_suffix' => env('LDAP_ACCOUNT_SUFFIX', '@xxxxx.local'),

                'hosts' => explode(' ', env('LDAP_HOSTS', 'xxxx.local')),

                'port' => env('LDAP_PORT', 389),

                'timeout' => env('LDAP_TIMEOUT', 10),

                'base_dn' => env('LDAP_BASE_DN', 'dc=xxxx,dc=local'),

                'username' => env('LDAP_USERNAME'),
                'password' => env('LDAP_PASSWORD'),

                'follow_referrals' => false,

                'use_ssl' => env('LDAP_USE_SSL', false),
                'use_tls' => env('LDAP_USE_TLS', false),

            ],

        ],

marcotulioaguilar commented 4 years ago

Yesterday I tested with php artisan queue:listen on linux supervisor and worked like expected. About Laravel Horizon, it's working but I'm not sure why. I keep receiving notifications about failed jobs and viewing them on the "Recent" and "Failed Jobs".

I found this topic https://github.com/laravel/horizon/pull/626 about running Horizon with queue:listen but seems that Taylor Otwell doesn't like the idea.

Thanks for the help.

My supervisor conf.

[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d

command=php /var/www/html/xxxxxxxx/artisan queue:listen redis --memory=256 --queue=default --sleep=3 --timeout=750 --tries=3
autostart=true
autorestart=true
user=xxxxx
numprocs=8
redirect_stderr=true
stdout_logfile=/var/www/html/xxxxxxxxx/laravel-worker.log

KuenzelIT commented 4 years ago

I'm also having issues with queue jobs and losing the ldap connection.

I have a queue worker which runs fine for hours or soemtimes for days, until the ldap connection gets lost and jobs using the Adldap facade fail because users are not found. I'm not doing any manual connect(), I'm just using the Adldap facade like this:

Adldap::search()->users()->where('employeeID', $employeeID);

auto_connect is also set to false in the config file and I always wondered why it works without using connect() before (but never investigated...).

So I have multiple questions:

Should I use a connect() in every job although it works without?
Why is it connecting with auto_connect set to false?
Do you have any idea why the connection gets lost?

This issue is quite hard to debug, so I appreciate every help on this. Thanks!

stevebauman commented 4 years ago

Hi @KuenzelIT,

I'm still looking to resolve this - I haven't found a resolution yet besides running jobs via the Laravel scheduler instead of queued jobs. Though I know this isn't a fix, but a workaround.

In regards to your questions:

Should I use a connect() in every job although it works without?

Why is it connecting with auto_connect set to false?

It works because the connection is being dynamically established if you're making calls on the Adldap facade without asking for a provider first:

https://github.com/Adldap2/Adldap2/blob/f09ca4ae3a65dbf123b56dc7e4ace8682ea16cdb/src/Adldap.php#L153-L164

// Adldap.php

public function __call($method, $parameters)
{
    $provider = $this->getDefaultProvider();

    if (!$provider->getConnection()->isBound()) {
        // We'll make sure we have a bound connection before
        // allowing dynamic calls on the default provider.
        $provider->connect();
    }

    return call_user_func_array([$provider, $method], $parameters);
}

If you call Adldap::getDefaultProvider()->search() - this would fail due to the connection not being established. I built it this way for developer convenience.

Do you have any idea why the connection gets lost?

I'm unsure why. I know it has to do with the connection being lost over time due to Laravel's queue worker being kept in memory, as restarting the queue worker will process the job fine (the LDAP connection is re-established).

I need to run a bunch of tests locally, but my time is limited unfortunately.

stevebauman commented 4 years ago

I'm wondering if we could use queue events to force re-establishing the connection on every loop:

use Adldap\Laravel\Facades\Adldap;
use Illuminate\Support\Facades\Queue;

Queue::looping(function () {
    Adldap::getDefaultProvider()->setConnection()->connect();
});

Still needs to be tested. I'll report back.

stevebauman commented 4 years ago

Okay, I've made progress. There's something going on with the usage the Adldap facade / the providers being setup in the application outside of the job.

Here's the job I'm running that works fine after it has failed and I've pushed it back onto the same queue worker process:

<?php

namespace App\Jobs;

use Adldap\Connections\Provider;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class TestJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function handle()
    {
        $provider = new Provider(config('ldap.connections.default.settings'));

        $provider->connect();

        $user = $provider->search()->users()->first();

        info('Found user: '.$user->getCommonName());
    }
}

The job below however, causes "Cannot connect to LDAP server" after losing connectivity and pushing the job back onto the queue:

public function handle()
{
    Adldap::connect();

    $user = Adldap::search()->users()->first();

    info('Found user: '.$user->getCommonName());
}

stevebauman commented 4 years ago

Getting closer - it's something to do with the singleton registration in the AdldapServiceProvider:

$this->app->singleton(AdldapInterface::class, function (Container $app) {
    // ...
});

If I swap it to a bind, failed jobs get processed without issue:

$this->app->bind(AdldapInterface::class, function (Container $app) {
    // ...
});

stevebauman commented 4 years ago

Apologies for all the comments here. Turns out that Laravel already dynamically handles lost or disconnected database connections in the queue:

https://github.com/laravel/framework/blob/7b074c1ac506c1895f85ec77481b55228a122a05/src/Illuminate/Database/Connection.php#L716-L725

protected function handleQueryException(QueryException $e, $query, $bindings, Closure $callback)
{
    if ($this->transactions >= 1) {
        throw $e;
    }

    return $this->tryAgainIfCausedByLostConnection(
        $e, $query, $bindings, $callback
    );
}

https://github.com/laravel/framework/blob/7b074c1ac506c1895f85ec77481b55228a122a05/src/Illuminate/Database/Connection.php#L619-L644

protected function run($query, $bindings, Closure $callback)
{
    $this->reconnectIfMissingConnection();

    $start = microtime(true);

    // Here we will run this query. If an exception occurs we'll determine if it was
    // caused by a connection that has been lost. If that is the cause, we'll try
    // to re-establish connection and re-run the query with a fresh connection.
    try {
        $result = $this->runQueryCallback($query, $bindings, $callback);
    } catch (QueryException $e) {
        $result = $this->handleQueryException(
            $e, $query, $bindings, $callback
        );
    }

    // ....
}

This needs to be done with Adldap2 as well, but will take time to implement...

KuenzelIT commented 4 years ago

Thanks for your work! It's nice to see how you work on this problem.

Adldap2 / Adldap2-Laravel