SessionID are duplicated often when collecting lots of sessions

This impacts our log hunting capabilities.

Assuming random number generators behave normally, we should not have many name duplications given we use random name (1 out of 5494) and a 100000 to 999999 random id.

10^5 * 9 * 5494
4944600000

However, we see hundreds of thousands of duplicates in our logs. Yes, we have millions of sessions but still it's too much.

SessionIDs are generated in pyrdp.core.mitm like this:

import random
# [...]
import names

# [...]
sessionID = f"{names.get_first_name()}{random.randrange(100000,999999)}"

The names module seems to have dubious crypto as is challenged here: https://github.com/treyhunner/names/issues/18#issuecomment-272858252

With 2.0 on the horizon, it's time to re-evaluate how we generate session IDs.

Our choices are:

something's wrong with the randomness pool on the server
names lib is bad
using regular random instead of something better
format <first_name><100000-999999> is not big enough

Created a test script:

import random
import names
import namesgenerator

newrng = random.SystemRandom()

iterations = 1_000_000

old_way = list()
new_namelib = list()
new_random = list()
new_len = list()
new_combined = list()
new_way = list()

for _c in range(iterations):
    session_id = f"{names.get_first_name()}{random.randrange(100000, 999999)}"
    old_way.append(session_id)

    session_id = f"{namesgenerator.get_random_name()}{random.randrange(100000, 999999)}"
    new_namelib.append(session_id)

    session_id = f"{names.get_first_name()}{newrng.randrange(100000, 999999)}"
    new_random.append(session_id)

    session_id = f"{names.get_first_name()}{random.randrange(1000000, 9999999)}"
    new_len.append(session_id)

    session_id = f"{namesgenerator.get_random_name()}{newrng.randrange(1000000, 9999999)}"
    new_combined.append(session_id)

    session_id = f"{namesgenerator.get_random_name()}{random.randrange(1000000, 9999999)}"
    new_way.append(session_id)

    print(".", end="", flush=True) if _c % 10_000 == 0 else 0

print(f"\nGenerated names: {len(old_way)}")

print("\nResults of non duplicates remaining:")
print(f"Old way               : {len(set(old_way))}")
print(f"New namelib           : {len(set(new_namelib))}")
print(f"New random            : {len(set(new_random))}")
print(f"New digit length (+1) : {len(set(new_len))}")
print(f"All Combined          : {len(set(new_combined))}")
print(f"Combined namelib / +1 : {len(set(new_way))}")

Ran tests in the cloud and locally.

Results:

$ python random_names_check.py 
....................................................................................................
Generated names: 1000000

Results of non duplicates remaining:
Old way               : 998005
New namelib           : 999960
New random            : 998005
New digit length (+1) : 999799
All Combined          : 999998
Combined namelib / +1 : 999997

GoSecure / pyrdp

SessionID are duplicated often when collecting lots of sessions #458