DefectDojo / django-DefectDojo

DevSecOps, ASPM, Vulnerability Management. All on one platform.
https://defectdojo.com
BSD 3-Clause "New" or "Revised" License
3.71k stars 1.55k forks source link

Performance Issue with Dojo - Finding Filtering #6388

Open andresoviedo opened 2 years ago

andresoviedo commented 2 years ago

Dear team,

Issue I am experiencing issues on the SQL performance of DefectDojo Webapplication. I have a 3000 users, 1000 product types, 3000 products and around 5 million Findings registered in MySQL database. The biggest performance issue I notice is at "login time"

workaround I have augmented RAM to 32GB for the database to fit all indexes in memory It seems it had reduced from 5 minutes waiting for queries to 10-20 seconds. However, I think this times should be still improved.

analysis I enabled the slow-query-log and I checked the slow queries I saw that most of the queries uses a strategy of using EXISTS to filter the authorized/viewable Findings for a Dojo User For all those queries, SQL EXPLAIN command, basically shows that it scans the full table (5 million records) :(
The database has all the default Dojo indexes

According to MySQL documentation, EXISTS (subquery) is executed for each and every Finding to determine if it should be included in the SQL results.

https://dev.mysql.com/doc/refman/8.0/en/subquery-optimization-with-exists.html MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.

tests The SQL example below gets all Findings for the specific user_id = 558, besides from filtering the "active" findings in the WHERE clause.

I did refactor the query to use INNER JOIN instead of EXISTS to get the authorized viewable Findings I got splendid results (manual test in MySQL Shell) and EXPLAIN command shows that the new query uses fully the indexes :)

For this "refactored query", at the end, what I understand, is that the INNER JOIN clause gets first all the test_id associated to a user, so it's directly filtered by the indexes and lastly, it performs a SCAN just over a few records for the WHERE clause.

Please.

Many thanks Andrés

`# Time: 2022-06-08T11:38:50.795012Z

Query_time: 18.619581 Lock_time: 0.000442 Rows_sent: 25 Rows_examined: 6029876

SET timestamp=1654688330; SELECT dojo_finding.id, dojo_finding.title, dojo_finding.date, dojo_finding.sla_start_date, dojo_finding.cwe, dojo_finding.cve, dojo_finding.cvssv3, dojo_finding.cvssv3_score, dojo_finding.url, dojo_finding.severity, dojo_finding.description, dojo_finding.mitigation, dojo_finding.impact, dojo_finding.steps_to_reproduce, dojo_finding.severity_justification, dojo_finding.refs, dojo_finding.test_id, dojo_finding.active, dojo_finding.verified, dojo_finding.false_p, dojo_finding.duplicate, dojo_finding.duplicate_finding_id, dojo_finding.out_of_scope, dojo_finding.risk_accepted, dojo_finding.under_review, dojo_finding.last_status_update, dojo_finding.review_requested_by_id, dojo_finding.under_defect_review, dojo_finding.defect_review_requested_by_id, dojo_finding.is_mitigated, dojo_finding.thread_id, dojo_finding.mitigated, dojo_finding.mitigated_by_id, dojo_finding.reporter_id, dojo_finding.numerical_severity, dojo_finding.last_reviewed, dojo_finding.last_reviewed_by_id, dojo_finding.param, dojo_finding.payload, dojo_finding.hash_code, dojo_finding.line, dojo_finding.file_path, dojo_finding.component_name, dojo_finding.component_version, dojo_finding.static_finding, dojo_finding.dynamic_finding, dojo_finding.created, dojo_finding.scanner_confidence, dojo_finding.sonarqube_issue_id, dojo_finding.unique_id_from_tool, dojo_finding.vuln_id_from_tool, dojo_finding.sast_source_object, dojo_finding.sast_sink_object, dojo_finding.sast_source_line, dojo_finding.sast_source_file_path, dojo_finding.nb_occurences, dojo_finding.publish_date, dojo_finding.service, EXISTS(SELECT (1) AS a FROM dojo_product_type_member U0 WHERE (U0.product_type_id = dojo_product.prod_type_id AND U0.role_id IN (1, 2, 3, 4, 5) AND U0.user_id = 558) LIMIT 1) AS test__engagement__product__prod_type__member, EXISTS(SELECT (1) AS a FROM dojo_product_member U0 WHERE (U0.product_id = dojo_engagement.product_id AND U0.role_id IN (1, 2, 3, 4, 5) AND U0.user_id = 558) LIMIT 1) AS test__engagement__product__member, EXISTS(SELECT (1) AS a FROM dojo_product_type_group U0 INNER JOIN dojo_dojo_group U1 ON (U0.group_id = U1.id) INNER JOIN dojo_dojo_group_member U2 ON (U1.id = U2.group_id) WHERE (U2.user_id = 558 AND U0.product_type_id = dojo_product.prod_type_id AND U0.role_id IN (1, 2, 3, 4, 5)) LIMIT 1) AS test__engagement__product__prod_type__authorized_group, EXISTS(SELECT (1) AS a FROM dojo_product_group U0 INNER JOIN dojo_dojo_group U1 ON (U0.group_id = U1.id) INNER JOIN dojo_dojo_group_member U2 ON (U1.id = U2.group_id) WHERE (U2.user_id = 558 AND U0.product_id = dojo_engagement.product_id AND U0.role_id IN (1, 2, 3, 4, 5)) LIMIT 1) AS test__engagement__product__authorized_group, COUNT(CASE WHEN NOT dojo_endpoint_status.mitigated THEN dojo_finding_endpoint_status.endpoint_status_id ELSE NULL END) AS active_endpoint_count, COUNT(CASE WHEN dojo_endpoint_status.mitigated THEN dojo_finding_endpoint_status.endpoint_status_id ELSE NULL END) AS mitigated_endpoint_count FROM dojo_finding INNER JOIN dojo_test ON (dojo_finding.test_id = dojo_test.id) INNER JOIN dojo_engagement ON (dojo_test.engagement_id = dojo_engagement.id) INNER JOIN dojo_product ON (dojo_engagement.product_id = dojo_product.id) LEFT OUTER JOIN dojo_finding_endpoint_status ON (dojo_finding.id = dojo_finding_endpoint_status.finding_id) LEFT OUTER JOIN dojo_endpoint_status ON (dojo_finding_endpoint_status.endpoint_status_id = dojo_endpoint_status.id) WHERE ((EXISTS(SELECT (1) AS a FROM dojo_product_type_member U0 WHERE (U0.product_type_id = dojo_product.prod_type_id AND U0.role_id IN (1, 2, 3, 4, 5) AND U0.user_id = 558) LIMIT 1) OR EXISTS(SELECT (1) AS a FROM dojo_product_member U0 WHERE (U0.product_id = dojo_engagement.product_id AND U0.role_id IN (1, 2, 3, 4, 5) AND U0.user_id = 558) LIMIT 1) OR EXISTS(SELECT (1) AS a FROM dojo_product_type_group U0 INNER JOIN dojo_dojo_group U1 ON (U0.group_id = U1.id) INNER JOIN dojo_dojo_group_member U2 ON (U1.id = U2.group_id) WHERE (U2.user_id = 558 AND U0.product_type_id = dojo_product.prod_type_id AND U0.role_id IN (1, 2, 3, 4, 5)) LIMIT 1) OR EXISTS(SELECT (1) AS a FROM dojo_product_group U0 INNER JOIN dojo_dojo_group U1 ON (U0.group_id = U1.id) INNER JOIN dojo_dojo_group_member U2 ON (U1.id = U2.group_id) WHERE (U2.user_id = 558 AND U0.product_id = dojo_engagement.product_id AND U0.role_id IN (1, 2, 3, 4, 5)) LIMIT 1)) AND dojo_finding.active ) GROUP BY dojo_finding.id ORDER BY dojo_finding.numerical_severity ASC LIMIT 25;`

StefanFl commented 2 years ago

Hi @andresoviedo thanks for your report. Here are some answers for your questions:

Do you make any sense out of this?

I use PosgtreSQL in production and MySQL partially for development and have not had issues, but don't have the amounts of data you have. So it's hard for me to reproduce it but your detailed investigations seems to make it clear where the problem is.

Do you have any advice on this SQL performance issue?

I am afraid nothing the might work for you. Users with global roles will not use these complicated queries and be much faster. But they can read or write every product with their findings. Switching to PostgreSQL will most likely not be an option for you, otherwise you might have done that already.

Does the latest Dojo version has better design/performance?

No, these queries haven't been changed.

Do you have any performance metrics available somewhere, so I can compare with mines?

See above, no.

Should I submit a Pull Request to refactor queries using EXISTS to be INNER JOIN?

I personally wouldn't know how to do it with Django, but if you have an idea it might be worth a try.

andresoviedo commented 2 years ago

Dear @StefanFl Thank you for your feedback

Here is 1 fully optimized indexed query that runs with no delay It is four union queries that gets the 4 possible authorizations 1 user may have for viewing a finding I will do my best to continue contributing on this matter...

`

OPTIMIZED QUERY

SELECT distinct dojo_finding.cwe FROM dojo_finding INNER JOIN dojo_test ON (dojo_finding.test_id = dojo_test.id) INNER JOIN dojo_engagement ON (dojo_test.engagement_id = dojo_engagement.id) INNER JOIN dojo_product ON (dojo_engagement.product_id = dojo_product.id)

INNER JOIN dojo_product_type_member ON (dojo_product_type_member.product_type_id = dojo_product.prod_type_id AND dojo_product_type_member.role_id IN (1, 2, 3, 4, 5) AND dojo_product_type_member.user_id = 542)

UNION

SELECT distinct dojo_finding.cwe FROM dojo_finding INNER JOIN dojo_test ON (dojo_finding.test_id = dojo_test.id) INNER JOIN dojo_engagement ON (dojo_test.engagement_id = dojo_engagement.id) INNER JOIN dojo_product ON (dojo_engagement.product_id = dojo_product.id)

INNER JOIN dojo_product_member ON (dojo_product_member .product_id = dojo_engagement.product_id AND dojo_product_member.role_id IN (1, 2, 3, 4, 5) AND dojo_product_member.user_id = 542)

UNION

SELECT distinct dojo_finding.cwe FROM dojo_finding INNER JOIN dojo_test ON (dojo_finding.test_id = dojo_test.id) INNER JOIN dojo_engagement ON (dojo_test.engagement_id = dojo_engagement.id) INNER JOIN dojo_product ON (dojo_engagement.product_id = dojo_product.id)

INNER JOIN dojo_product_type_group ON (dojo_product_type_group.product_type_id = dojo_product.prod_type_id AND dojo_product_type_group.role_id IN (1, 2, 3, 4, 5))

INNER JOIN dojo_dojo_group ON (dojo_product_type_group.group_id = dojo_dojo_group .id)

INNER JOIN dojo_dojo_group_member ON (dojo_dojo_group .id = dojo_dojo_group_member .group_id AND dojo_dojo_group_member .user_id = 542)

UNION

SELECT distinct dojo_finding.cwe FROM dojo_finding INNER JOIN dojo_test ON (dojo_finding.test_id = dojo_test.id) INNER JOIN dojo_engagement ON (dojo_test.engagement_id = dojo_engagement.id) INNER JOIN dojo_product ON (dojo_engagement.product_id = dojo_product.id)

INNER JOIN dojo_product_group ON (dojo_product_group.product_id = dojo_engagement.product_id AND dojo_product_group.role_id IN (1, 2, 3, 4, 5))

INNER JOIN dojo_dojo_group ON (dojo_product_group.group_id = dojo_dojo_group .id)

INNER JOIN dojo_dojo_group_member ON (dojo_dojo_group .id = dojo_dojo_group_member .group_id AND dojo_dojo_group_member .user_id = 542)

WHERE dojo_finding.active `

StefanFl commented 2 years ago

Now we would need to find out, how to write the Python code so that Django generates a SQL statement like this. I currently have no idea how to do this to be honest.

farsheedify commented 1 month ago

Hey @andresoviedo , We are having performance issues (specially for search/filtering operations) with much less data in our DB (250 products and 50000 findings). Could you share the configuration and the version you used for deploying Defectdojo? We want to adjust CPU and Memory to meet our requirements.

andresoviedo commented 1 month ago

Hi @farsheedify Version 2.8.0 Cpu/ram 8 cores/32gb

farsheedify commented 1 month ago

Hi @farsheedify Version 2.8.0 Cpu/ram 8 cores/32gb

Thanks Andre. Have you deployed databse on a separate VM, and a single worker for DefectDojo?