fix(scan): resolve detection of the first endpoint in the initiate scan task

psyray commented 1 week ago

Fixes #237

Replace HTTPx first scan by nmap, then launch HTTPx with discovered port
Create a reusable function to launch nmap on the fly
Add parsing to get ports and services from Nmap output
Add more logs to debug scans while running
Remove the HTTP CRAWL global var, Nmap is the default to retrieve the first endpoint (the starting point for all the others tasks)
Adjust the is_alive parameter for tasks that need alive endpoints
Fix S3 scanner source file not found
Add more checks to prevent errors and scan crash
Refactor Endpoint saving for a better logic and less errors
Improve URLs validation

Summary by Sourcery

Bug Fixes:

Fix the issue where the S3 scanner source file was not found, ensuring the correct file path is used.

Enhancements:

Refactor the endpoint saving logic to improve clarity and reduce errors.
Improve URL validation by adding a new function to check the validity of URLs, including domain:port formats.
Add more logging to provide better insights during scan execution and debugging.

This PR prepared the ground to effectively resolve #208 & #8

Todo

[x] Test initiate scan with Full scan
[x] Test initiation subscan with all the scan type

sourcery-ai[bot] commented 1 week ago

Reviewer's Guide by Sourcery

This PR improves the scan initialization process by replacing HTTPx with Nmap for initial endpoint detection and adds several robustness improvements. The main changes focus on better error handling, improved URL validation, and more structured service detection using Nmap. The code has been refactored to be more maintainable and reliable.

Sequence diagram for scan initiation with Nmap

sequenceDiagram
    actor User
    participant System
    participant Nmap
    participant HTTPx
    User->>System: Initiate scan
    System->>Nmap: Run Nmap to find web services
    alt Web services found
        Nmap-->>System: Return open ports and services
        System->>HTTPx: Launch HTTPx with discovered ports
        HTTPx-->>System: Return HTTP endpoints
    else No web services found
        Nmap-->>System: No open ports
        System-->>User: Scan failed
    end

Updated class diagram for scan initiation

classDiagram
    class ScanHistory {
        +int id
        +DateTime last_scan_date
        +String scan_status
        +String error_message
        +void save()
    }
    class Domain {
        +int id
        +String name
        +DateTime last_scan_date
        +void save()
    }
    class Subdomain {
        +int id
        +String name
        +void save()
    }
    class EndPoint {
        +int id
        +String http_url
        +bool is_default
        +DateTime discovered_date
        +void save()
    }
    class Nmap {
        +dict get_nmap_http_datas(String host, dict ctx)
    }
    ScanHistory --> Domain : belongs to
    Domain --> Subdomain : has
    Subdomain --> EndPoint : has
    EndPoint --> Nmap : uses
    note for Nmap "Nmap is used to detect open ports and services"

File-Level Changes

Change	Details	Files
Replace HTTPx with Nmap for initial endpoint detection	Created new function get_nmap_http_datas() to detect web services Added parsing of Nmap results to identify HTTP/HTTPS services Modified initiate_scan to use Nmap results for endpoint creation Added support for multiple endpoints discovery from Nmap results	`web/reNgine/tasks.py`
Improve URL validation and endpoint handling	Added new is_valid_url() function with comprehensive URL validation Enhanced save_endpoint() with better error checking and duplicate prevention Added validation for domain names, IP addresses, and port numbers Improved handling of default endpoints for subdomains	`web/reNgine/common_func.py` `web/reNgine/tasks.py`
Enhance error handling and logging	Added more detailed error messages and debug logging Improved error handling in subscan initialization Added checks for missing or invalid scan data Enhanced logging to include domain/subdomain context	`web/reNgine/tasks.py` `web/reNgine/celery_custom_task.py`
Refactor Nmap parsing functionality	Split parse_nmap_results() into different parsing types (vulnerabilities, services, ports) Improved Nmap command generation with better parameter handling Added support for different Nmap output formats Enhanced service detection logic	`web/reNgine/tasks.py` `web/reNgine/common_func.py`
Update configuration handling	Created get_http_crawl_value() for centralized HTTP crawl configuration Changed default HTTP crawl setting to false Added support for subscan-specific configurations Improved configuration inheritance logic	`web/reNgine/common_func.py` `web/reNgine/settings.py`

Possibly linked issues

#123: PR enhances HTTP scanning by using Nmap to detect web services, aligning with issue's goal.
#7: PR changes improve alive host detection, directly addressing the issue's scan initiation problem.

Tips and commands

#### Interacting with Sourcery - **Trigger a new review:** Comment `@sourcery-ai review` on the pull request. - **Continue discussions:** Reply directly to Sourcery's review comments. - **Generate a GitHub issue from a review comment:** Ask Sourcery to create an issue from a review comment by replying to it. - **Generate a pull request title:** Write `@sourcery-ai` anywhere in the pull request title to generate a title at any time. - **Generate a pull request summary:** Write `@sourcery-ai summary` anywhere in the pull request body to generate a PR summary at any time. You can also use this command to specify where the summary should be inserted. #### Customizing Your Experience Access your [dashboard](https://app.sourcery.ai) to: - Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others. - Change the review language. - Add, remove or edit custom review instructions. - Adjust other review settings. #### Getting Help - [Contact our support team](mailto:support@sourcery.ai) for questions or feedback. - Visit our [documentation](https://docs.sourcery.ai) for detailed guides and information. - Keep in touch with the Sourcery team by following us on [X/Twitter](https://x.com/SourceryAI), [LinkedIn](https://www.linkedin.com/company/sourcery-ai/) or [GitHub](https://github.com/sourcery-ai).

psyray commented 6 days ago

@AnonymousWP

Ready to merge for me. This one fix an old business logic error that I wanted to fix from a long time ago Replace inital HTTPx scan, that scans only the http port, by nmap which is more accurate in this case. It quickly probes the web service ports, and according to the result it will use the good http scheme by prioritizing https port if it exists. This will be improved in the future, but for the moment, the main goal was to fix screenshot issues (and others by collateral damage) and it prepares the ground to make a more precise initial scan, which is the starting point of all the remaining scan.

So if this one is badly recognized, reconnaissance will fail, and pentester could pass away a critical target

All my tests are OK

Security-Tools-Alliance / rengine-ng