DirectoryTree / LdapRecord

A fully-featured LDAP framework.
https://ldaprecord.com
MIT License
517 stars 44 forks source link

[Bug] Getting an exception for "paged results cookie is invalid" when chunking a model #541

Closed cheesegrits closed 1 year ago

cheesegrits commented 1 year ago

Environment:

Describe the bug:

Any attempt to chunk() on a model results in an exception for "paged results cookie is invalid" when it tries to get the second chunk.

CotakUser::chunk(100, function ($ldapUsers) {
   //
});

... where CotakUser is (in part) ...

use LdapRecord\Models\OpenLDAP\Entry;
use LdapRecord\Models\Concerns\CanAuthenticate;
use LdapRecord\Models\Concerns\HasPassword;

class CotakUser extends Entry implements Authenticatable
{
    use HasPassword;
    use CanAuthenticate;
   //
}
stevebauman commented 1 year ago

Hi @cheesegrits,

Does pagination work? I.e.:

$users = CotakUser::paginate();
cheesegrits commented 1 year ago

I shall go check. I was just poking around to make sure my OpenLDAP server supports pagination, and it does ...

supportedControl: 1.2.840.113556.1.4.319

Back in a few ...

cheesegrits commented 1 year ago

Yes, that returns all users. Although it seems to always return all users, regardless of what $pageSize I ask for.

cheesegrits commented 1 year ago

What I'm seeing is, if I put a breakpoint here ...

https://github.com/DirectoryTree/LdapRecord/blob/master/src/Query/Pagination/Paginator.php#L53

... when it executes the CotakUser::paginate(100), it does what I would expect for 330 users - it hits that BP four times, the first three times $cookie gets a value, the fourth time it's empty.

However, when I run the chunk(), the first time through, $cookie is empty.

cheesegrits commented 1 year ago

And of course I realize paginate() returns all users, it just builds the response with pagination.

stevebauman commented 1 year ago

That's strange for pagination to work, as the implementations are the same, except the LazyPaginator uses yield instead of collecting the pages of results:

Lazy Paginator: https://github.com/DirectoryTree/LdapRecord/blob/3557d500b24b331852e4c9ebea1e712b7629cb09/src/Query/Pagination/LazyPaginator.php#L16-L33

Abstract Paginator: https://github.com/DirectoryTree/LdapRecord/blob/3557d500b24b331852e4c9ebea1e712b7629cb09/src/Query/Pagination/AbstractPaginator.php#L58-L79


Run Chunk: https://github.com/DirectoryTree/LdapRecord/blob/3557d500b24b331852e4c9ebea1e712b7629cb09/src/Query/Builder.php#L570-L575

Run Paginate: https://github.com/DirectoryTree/LdapRecord/blob/3557d500b24b331852e4c9ebea1e712b7629cb09/src/Query/Builder.php#L505-L510

cheesegrits commented 1 year ago

I'm stepping thru code, see if I can find anything. As an experiment, I tried setting $this->paginated to true in chunk(), at which point the LDAP server does return a cookie, but things break when it tries to hydrate the response. So looks like something somewhere isn't setting the control to ask for a cookie when chunking.

cheesegrits commented 1 year ago

Ah. It's because I'm running other LDAP queries inside my chunk code ...

        try {
            CotakUser::chunk(100, function ($ldapUsers) use (&$ldapUsersFound, &$ldapUserEmails, $command) {
                /** @var CotakUser $ldapUser */
                foreach ($ldapUsers as $ldapUser) {
                    $email         = $ldapUser->getFirstAttribute('uid');
                    $ldapUserFound = [];

                    /** @var CotakGroup $group */
                    foreach ($ldapUser->groups()->get() as $group) {
                        $ldapUserFound[] = $group->getFirstAttribute('cn');
                    }

                    $ldapUsersFound[$email] = $ldapUserFound;
                    $ldapUserEmails[]       = $email;
                }
            });
        }
        catch (\Exception $e) {
            $command->line('Exception: ' . $e->getMessage());
        }

... which runs the AbstractPaginator, and I assume is somehow stomping on the cookie from the SimplePaginator the chunk() is running. If I remove that $ldapUser->groups()->get() code, the chunking works fine.

I don't know if this should work, or if that's asking too much of LDAP pagination. I can work round it for now, do that group name collection outside the chunk. Although at that point I might as well just use paginate() instead.

It's not a huge deal, I'm just future proofing some maintenance code, which sanity checks my database against the LDAP server. I'm expecting the directory to grow to tens of thousands of users, so wanted to use chunking whenever I fetch all users.

stevebauman commented 1 year ago

Nice find @cheesegrits! I appreciate your help debugging this.

I agree that this is a bug. I’ll see if I can figure out a way to allow this scenario to occur, as I’m sure others will eventually run into this at some point.

I’m not sure if LDAP servers can receive pagination cookies to continue a pagination request on the same connection after another query.

I’ll implement the other feature you’ve posted about first, and then I’ll tackle this 👍

Though of course if you find anything in the meantime, PRs are always welcome.

cheesegrits commented 1 year ago

No desperate hurry on this one, as the project is in limited internal testing atm, and we don't expect significant growth till later this year. So right now I only have 300 or so users, and 50 or so groups. But the expectation is that this will potentially grow within a year or so to tens of thousands of users, so I'm trying to build in chunking where it may be necessary now.

BTW, my interpretation of the RFC for simple pagination is that the intent is to allow multiple concurrent paginated search requests, and that the server associates each cookie with a specific searchRequest. But the language uses "may" rather than "must", so who knows.

https://www.rfc-editor.org/rfc/rfc2696

A client may have any number of outstanding search requests pending, any of which may have used the pagedResultsControl. A server implementation which requires a limit on the number of outstanding paged search requests from a given client MAY either return unwillingToPerform when the client attempts to create a new paged search request, or age out an older result set. If the server implementation ages out an older paged search request, it SHOULD return "unwilling to perform" if the client attempts to resume the paged search that was aged out.

cheesegrits commented 1 year ago

I've spent a couple of hours debugging this, and I don't see LdapRecord doing anything wrong. It is storing the returned cookie from the chunk() in the LazyPaginator's query, and correctly setting it again in applyServerControls() when fetching the next chunk. I had thought that maybe the cookie was being overwritten by the AbstractPaginator doing the paginate() within my chunk() processing when I fetch the user's groups.

I'm beginning to think that maybe OpenLDAP doesn't implement the same "cookie pool" that Active Directory does, and that any given client can only have one search cookie active at a time - I've tried reducing my chunk size down to 3, no luck. And even if it did, it I suspect I would overrun the limits, as Microsoft talk about have a MaxResultSetsPerConn of 10. Whereas I'm iterating over hundreds of users in my chunk, fetching their groups, so would generate hundreds of search sets requiring cookies (although they should run to completion and get cleared, one would have thought).

I think I'm just going to have to refactor how I do this, and avoid doing anything that generates potentially paginated LDAP searches within the chunking.

stevebauman commented 1 year ago

Thanks for your detective work on this one @cheesegrits! I really appreciate it 🙏 .

I wonder if we could dynamically generate a new LDAP connection on any queries executed within a chunk? 🤔

Maybe this would be the solution to maintain pagination state with the current connection, but open and close new ones automatically on queries within the callback...

cheesegrits commented 1 year ago

Yeah, that might work. Some of the discussion I've seen in googling this topic talks about connection pooling on the client side. I'm not sure how much overhead it is setting up and tearing down connections, so a pool might be the way to go. Some mechanism where a connection is unavailable to use if it has a cookie, and only open a new connection and add it to the pool if there aren't any in the pool with no cookie, rather than setting up and tearing down every time a paginated request is made. With a configurable max pool size, which throws an exception if you try and open too many concurrent paginated requests.

I'd also be interested to see if this works as-is against Active Directory, but I don't have an AD server handy. I definitely get the impression AD handles cookies per-search, up to a maximum number of concurrent searches, rather than per-connection.

Here's an example of where I think someone ran in to this same issue, where the code works against AD but not against OpenLDAP.

https://stackoverflow.com/questions/57632585/paginating-openldap-with-apache-directory-ldap-api-over-multiple-connections

cheesegrits commented 1 year ago

BTW, here's where Microsoft document their cookie handling, and the MaxResultSetsPerConnection limits.

I haven't managed to definitely confirm it yet, but it seems like OpenLDAP (at least the server implementation I have) simply doesn't support this, and just has one cookie per connection. But documentation on specific server implementations of RFC 2696 are essentially non existent, apart from Active Directory.

https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/how-ldap-server-cookies-are-handled

stevebauman commented 1 year ago

Hi @cheesegrits! Thanks so much for the additional detail. It's a huge help.

I believe I have a solution that does not involve "pooling" per-se, but will actually run chunked queries inside of their own isolated connection, so that other queries can be executed on the main connection, and subsequent chunked queries can also be executed (also on their own isolated connection).

The PR is open here: https://github.com/DirectoryTree/LdapRecord/pull/546

I tested this on my own dockerized OpenLDAP server. However, I was unable to replicate your error with this distro/version.

Would you perhaps be able to test this PR locally to ensure this resolves your error?

cheesegrits commented 1 year ago

@stevebauman is there a way to test a branch through Composer? Or do I have to clone the repo locally and use a symlink in composer.json?

stevebauman commented 1 year ago

Yes absolutely, insert this into your composer.json:

"directorytree/ldaprecord": "dev-chunk-query-isolation",

Then, run composer update.

Let me know if you run into issues!

cheesegrits commented 1 year ago

Unfortunately that seems to conflict with the requirement in ldaprecord-laravel ...

        "directorytree/ldaprecord-laravel": "^2.7",
        "directorytree/ldaprecord": "dev-chunk-query-isolation",

//

Problem 1
    - directorytree/ldaprecord-laravel[v2.7.0, ..., v2.7.3] require directorytree/ldaprecord ^2.4.4 -> found directorytree/ldaprecord[v2.4.4, ..., v2.19.4] but it conflicts with your root composer.json require (dev-chunk-query-isolation).
    - Root composer.json requires directorytree/ldaprecord-laravel ^2.7 -> satisfiable by directorytree/ldaprecord-laravel[v2.7.0, v2.7.1, v2.7.2, v2.7.3].

I'll try the symlink method.

cheesegrits commented 1 year ago

OK, got it working. Had to clone both repos locally, set up symlinks for both in my composer.json with versions of "*", branch ldaprecord-laravel locally so I could set the composer.json dependency for ldaprecord in that to "*", then switch to the chunk-query-isolation branch on ldaprecord.

And it works. :)

As in, the query isolation works, and I can now happily query members and groups inside my chunk code.

Great work, thank you!

stevebauman commented 1 year ago

Ah I wasn't sure if you were using Laravel as well -- that definitely makes things trickier!

As in, the query isolation works, and I can now happily query members and groups inside my chunk code.

Excellent! That's great to hear!

Let me fix up the remaining tests and I'll have this released (hopefully later today) 👍

cheesegrits commented 1 year ago

@stevebauman just wanted to let you know the fix works in production, with the new named 'isolate' argument. My live project is now updated and running fine.

stevebauman commented 1 year ago

Excellent! That's great to hear @cheesegrits 🙏 . Thanks for following up and letting me know.