bug(ui): Subdomain import could fail if suffix more than 4 chars

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

During a pentest, I have a vpn connection to internal network, and AD domain name was like mynetwork.testo.

I have retrieved some hostnames from the domain : servers, workstations ... ~ 100 assets

After configuring DNS resolving to query internal DC (in /etc/resolv.conf), and adding the domain name above as a target,

I initiate a scan and supply hostnames in the textarea field.

I've launched the scan but subdomains was not imported.

After investigation it comes from the get_domain_from_subdomain, more precisely from tldextract function. https://github.com/Security-Tools-Alliance/rengine-ng/blob/55c917984b03dfb105d35084c000b0fae1cd1b48/web/reNgine/common_func.py#L427-L437

Here are the test sample result

I try this URL: subdomain.mynetwork.testo Result is ExtractResult(subdomain='subdomain.mynetwork', domain='testo', suffix='') ❎ Problem

And if I try this URL: subdomain.mynetwork.test Result is ExtractResult(subdomain='subdomain', domain='mynetwork', suffix='test') ✅ Correct

So tldextract does not correctly extract the TLD. It's because the way tldextract works tldextract relies on a list of known domain suffixes to determine which part of your URL is the domain and which part is the suffix.

When you use a URL with an unusual or non-standard suffix (like .testo in my example), tldextract may not recognize it as a valid suffix if it is not present in its list. As a result, it may misinterpret parts of the URL.

For this project I have modified the code of custom_func to split url on . and achieve my goal, but maybe we could refound this part to have a more accurate domain extraction

def get_domain_from_subdomain(subdomain):
    """Get domain from subdomain.

    Args:
        subdomain (str): Subdomain name.

    Returns:
        str: Domain name.
    """
    domain, suffix = extract_domain_and_suffix(subdomain)
    return '.'.join([domain, suffix])

def extract_domain_and_suffix(url):
    parts = url.split('.')

    if len(parts) >= 2:
        domain = parts[-2]
        suffix = parts[-1]
        return domain, suffix
    else:
        return None, None

Expected Behavior

Subdomains should have been imported as the TLD is the same as the target TLD

Steps To Reproduce

Add a target TLD with an exotic suffix
Initiate a scan and provide a list of subdomain to import with valid TLD
Subdomains not imported

Environment

- reNgine: 2.0.2
- OS: Ubuntu 22.04
- Python: 3.10
- Docker Engine: 
- Docker Compose: 
- Browser: FF 120

Anything else?

No response

Security-Tools-Alliance / rengine-ng