Massive number of duplicate user entries from GetADUsers.py for large domains

tomspencer commented 1 year ago

When running GetADUsers.py in two separate large domains (80k+ users, although I suspect 10-20k might be enough to trigger this), there are a huge number of duplicate user entries in batches of 100 (e.g. user entry 8000 is repeated at 8100, 8001 is repeated at 8101, etc.). The situation worsens the more users you have.

Configuration

impacket version: current master branch (v0.10.1.dev1+20230628.102844.eb8a3944) and v0.10.0-4 Python version: v3.10.6 and v3.11.2 Target OS: Ubuntu 22.04 and Kali 2023.2

Debug Output With Command String

./GetADUsers.py -k -no-pass '[redacted]/[redacted]@[redacted]' -all -dc-ip [redacted] -debug

Impacket v0.10.1.dev1+20230628.102844.eb8a3944 - Copyright 2022 Fortra

[+] Impacket Library Installation Path: /home/[redacted]/.local/lib/python3.10/site-packages/impacket
[*] Getting machine hostname
[+] Connecting to [redacted], port 389, SSL False
[+] Using Kerberos Cache: /tmp/[redacted].ccache
[+] SPN LDAP/[redacted]@[redacted] not found in cache
[+] AnySPN is True, looking for another suitable SPN
[+] Returning cached credential for KRBTGT/[redacted]@[redacted]
[+] Using TGT from cache
[+] Trying to connect to KDC at [redacted]
[+] Connecting to [redacted], port 636, SSL True
[+] Using Kerberos Cache: /tmp/[redacted].ccache
[+] SPN LDAP/[redacted]@[redacted] not found in cache
[+] AnySPN is True, looking for another suitable SPN
[+] Returning cached credential for KRBTGT/[redacted]@[redacted]
[+] Using TGT from cache
[+] Trying to connect to KDC at [redacted]
[*] Querying [redacted] for information about domain.
Name                  Email                           PasswordLastSet      LastLogon
--------------------  ------------------------------  -------------------  -------------------
[+] Search Filter=(&(sAMAccountName=*)(objectCategory=user))
00001                                                 2017-05-18 01:34:22.800068  N/A
00002                                                 <never>              N/A
00003                                                 <never>              N/A
...

The command runs fine but takes a very long time and (when directed to a file) produces a massive file full of duplicate batches of users. In an environment with 90k users, the resulting file was ~381MB and ~3.87 million lines long. When duplicate lines are removed (i.e. 'sort console.out | uniq > users.clean') the resulting file was ~9MB and ~90k lines long (as you would expect for 90k users).

Additional context

I suspect the problem lies with the ldap paging/cursor, as that paging works in batches of 100 which aligns to these duplicate batches of 100.

In one environment the first duplicated batch of 100 users was at line ~6300 (meaning line 6300 was the same 6200, 6301 as 6201, etc.), while in another it was around 8100. It doesn't seem to be entirely consistent but seems to happen in the first 10-20k, and then continues to get worse as it goes.

Once the duplications begin (that is batches start appearing 2x), later batches then start appearing 3x, then 4x, etc. One batch toward the end repeated 259 times.

Compounding duplication shown below:

$ grep --line-num "^bobsmith" console.out
6207:bobsmith                bob.smith@example.com   2023-05-30 00:24:11.585548  N/A
6307:bobsmith                bob.smith@example.com   2023-05-30 00:24:11.585548  N/A

$ grep --line-num "^markstar" console.out
7403:markstar                mark.star@example.com      2023-04-28 04:06:48.968607  N/A
7503:markstar                mark.star@example.com      2023-04-28 04:06:48.968607  N/A
7603:markstar                mark.star@example.com      2023-04-28 04:06:48.968607  N/A

$ grep --line-num "^gailnorth" console.out
9413:gailnorth                gail.north@example.com         2023-05-10 06:40:05.616959  N/A
9513:gailnorth                gail.north@example.com         2023-05-10 06:40:05.616959  N/A
9613:gailnorth                gail.north@example.com         2023-05-10 06:40:05.616959  N/A
9713:gailnorth                gail.north@example.com         2023-05-10 06:40:05.616959  N/A
9813:gailnorth                gail.north@example.com         2023-05-10 06:40:05.616959  N/A
9913:gailnorth                gail.north@example.com         2023-05-10 06:40:05.616959  N/A
10013:gailnorth                gail.north@example.com         2023-05-10 06:40:05.616959  N/A

// A duplicate block towards the end... for the sake of brevity we'll cap it to three here and then count them

$ grep --line-num "^yorkdoe " console.out | head -3
3853106:yorkdoe          yorkdoe@example.com         2020-07-13 09:44:01.909604  N/A
3853206:yorkdoe          yorkdoe@example.com         2020-07-13 09:44:01.909604  N/A
3853306:yorkdoe          yorkdoe@example.com         2020-07-13 09:44:01.909604  N/A

$ grep "^yorkdoe " console.out | wc -l
259

This problem compounds to the point that for even larger domains it becomes almost impossible to complete/store the output.

Of particular concern here is that if this issue is in the underlying ldap.py library it may be impacting a number of other example scripts/components that rely on it as well.

ThePirateWhoSmellsOfSunflowers commented 1 year ago

Hello!

Can you try to change this:

        try:
            logging.debug('Search Filter=%s' % searchFilter)
            sc = ldap.SimplePagedResultsControl(size=100)
            ldapConnection.search(searchFilter=searchFilter,
                                  attributes=['sAMAccountName', 'pwdLastSet', 'mail', 'lastLogon'],
                                  sizeLimit=0, searchControls = [sc], perRecordCallback=self.processRecord)

to

        try:
            logging.debug('Search Filter=%s' % searchFilter)
            # Microsoft Active Directory set an hard limit of 1000 entries returned by any search
            paged_search_control = ldapasn1.SimplePagedResultsControl(criticality=True, size=1000)

            resp = ldapConnection.search(searchFilter=searchFilter,
                                         attributes=['sAMAccountName', 'pwdLastSet', 'mail', 'lastLogon'],
                                         searchControls=[paged_search_control], perRecordCallback=self.processRecord)

within the GetADUsers.py file (line 192) ? This is how paged search is implemented in other impacket examples and it works.

:sunflower:

tomspencer commented 1 year ago

I was optimistic that was going to work, but unfortunately it exacerbated the problem. 😞

Same general behavior as above, but now the duplicates are in groups of 1,000. This resulted in an even larger file. I killed it before it finished, but it was over 400MB and over 4 million lines, and had only gotten ~60k of the ~90k unique users.

$ grep --line-num "^jdoe " console2.out
7022:jdoe                j.doe@example.com        2023-06-05 09:09:57.417645  N/A
8022:jdoe                j.doe@example.com        2023-06-05 09:09:57.417645  N/A

$ grep --line-num "^kburns " console2.out | head -3
4010463:kburns                ken.burns@example.com       2022-12-02 05:47:35.254596  N/A
4011463:kburns                ken.burns@example.com       2022-12-02 05:47:35.254596  N/A
4012463:kburns                ken.burns@example.com       2022-12-02 05:47:35.254596  N/A

$ grep --line-num "^kburns " console2.out | wc -l
87

ThePirateWhoSmellsOfSunflowers commented 1 year ago

Mhh impacket ldap implementation is not really stable and not LDAP RFC strict, that's why a lot of module uses ldap3 library instead. I recommend you to switch to another tool such as pywerview (with the get-netuser function) for example.

Full disclosure, I'm the maintainer of pywerview.

:sunflower:

ThePirateWhoSmellsOfSunflowers commented 1 year ago

I'm trying to reproduce the issue but I can't:

$ GetADUsers.py domain.local/administrator:'Password963' -all -dc-ip dc02.domain.local > out
$ wc -l out 
20021 out
$ cut -f 1 -d ' ' out | sort | uniq -c | sort -rn | head
      1 999959454
      1 999919655
      1 999672882
      1 999507452
      1 999491188
      1 999302173
      1 999273730
      1 999166230
      1 999120571
      1 999033491
$ cut -f 1 -d ' ' out | sort | uniq -c | sort -n | head 
      1 1000027054
      1 1000172078
      1 1000526385
      1 100057581
      1 1000606675
      1 1000636693
      1 1000679411
      1 1000765249
      1 1000786424
      1 1000812028

(for the test, I created ~20k users named after poweshell's Get-Random)

:sunflower:

tomspencer commented 1 year ago

Strange. The environments i'm seeing this in are on the order of 90k, and 200k users, but not sure if that matters.

If there is anything I can do to provide more duplication/debugging info please let me know.

fortra / impacket