elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.48k stars 8.04k forks source link

[Cloud Security] Fix Installation Stats Null values in Gold Feature Adoption Table #171404

Open Omolola-Akinleye opened 7 months ago

Omolola-Akinleye commented 7 months ago

When I perform query SELECT * FROM `elastic-security-prod.gold_cloud_security.gold_cloud_security_feature_adoption` WHERE TIMESTAMP_TRUNC(timestamp, DAY) < TIMESTAMP("2023-11-16") AND sfdc_account_name IS NOT NULL LIMIT 1000, the installation stats fields are null.

CohenIdo commented 5 months ago

@Omolola-Akinleye I can see a bunch of records when I add filter package_policy_id is NOT NULL

SELECT * FROM `elastic-security-prod.gold_cloud_security.gold_cloud_security_feature_adoption` WHERE 
TIMESTAMP_TRUNC(timestamp, DAY) > TIMESTAMP("2023-11-16")
AND sfdc_account_name IS NOT NULL 
AND package_policy_id is NOT NULL
LIMIT 1000

Am I missing something?

Omolola-Akinleye commented 5 months ago

@CohenIdo You are correct! I need to investigate this bug more. For context, when I created a new feature adoption table, I was experiencing Buildkite query performance failures when trying to join installation stats, indices table, cloud account stats table, fleet table, and gold cloud customer tables. I didn't see these null values after the 8.11 release. It's likely I need to re-create the table and optimize the SQL query. There are many joins from the different tables to create gold-feature-adoption table which caused the build to fail with a timeout error.

There were a couple of times the build passed. Tables were created but didn't populate installation stats fields or the customer's stats fields, hence why you see null values.

To fix this bug we have to re-evaluate and fix the gold-feature-adoption-table query work at build-time.

@kfirpeled @CohenIdo This bug will need to fixed to create the Cloud Security Feature Adoption Dashboard. Should we add this to Sprint 34?

CohenIdo commented 5 months ago

@Omolola-Akinleye, I agree that we need to fix the bug, but I am not sure I understand why it's a blocker if we can see some data in the table.

Omolola-Akinleye commented 5 months ago

@CohenIdo Yes we have some data but we are missing installation stats that help create charts so Credentials, Single vs Multi-account, automatic vs manual setup from the Wireframe)

Omolola-Akinleye commented 5 months ago

@cohenido I added the sql below so we can see package policy with INNER JOIN on package_policy_id



select distinct
    cloud_account_stats.timestamp,
    cloud_account_stats.channel,
    cloud_account_stats.version,
    cloud_account_stats.cluster_uuid,
    cloud_account_stats.cluster_name,
    cloud_account_stats.doc_hash,
    cloud_account_stats.product,
    cloud_account_stats.cloud_provider,
    cloud_account_stats.cloud_env_account_id,
    cloud_account_stats.doc_count,
    cloud_account_stats.latest_doc_updated_timestamp,
    cloud_account_stats.package_policy_id,

    cloud_security_installation_stats.package_version,
    cloud_security_installation_stats.feature,
    cloud_security_installation_stats.deployment_mode,
    cloud_security_installation_stats.deployed_agent_count,
    cloud_security_installation_stats.installed_at,
    cloud_security_installation_stats.agent_policy_id,
    cloud_security_installation_stats.account_type,
    cloud_security_installation_stats.is_setup_automatic,
    cloud_security_installation_stats.setup_access_option,

    cloud_security_fleet_metadata.license_issue_date,
    cloud_security_fleet_metadata.license_issued_to,
    cloud_security_fleet_metadata.license_issuer,
    cloud_security_fleet_metadata.license_status,
    cloud_security_fleet_metadata.license_type,
    cloud_security_fleet_metadata.license_uid,
    cloud_security_fleet_metadata.ess_account_id,
    cloud_security_fleet_metadata.es_uuid,
    cloud_security_fleet_metadata.kb_uuid,
    cloud_security_fleet_metadata.is_elastic_staff_owned,
    cloud_security_fleet_metadata.in_trial,
    cloud_security_fleet_metadata.referer,
    cloud_security_fleet_metadata.is_csp_installed,
    cloud_security_fleet_metadata.csp_package_version,

    gold_security_solution_signals.cloud_org_uuid,
    gold_security_solution_signals.cloud_org_email_domain,
    gold_security_solution_signals.sfdc_account_name,
    gold_security_solution_signals.sfdc_account_owner,
    gold_security_solution_signals.sfdc_account_avp_email,
    gold_security_solution_signals.sfdc_mkt_account_industry,
    gold_security_solution_signals.sfdc_mkt_account_segment,
    gold_security_solution_signals.sfdc_mkt_account_sector,
    gold_security_solution_signals.customer_account_is_internal,
    gold_security_solution_signals.cloud_org_saas_billing_estimate_monthly_recurring_revenue,
    gold_security_solution_signals.customer_account_subscription_type,

from `elastic-security-staging`.`silver_cleansed_entities`.`silver_kbn_cloud_security_cloud_account_stats` as cloud_account_stats

inner join
    `elastic-security-staging`.`silver_cleansed_entities`.`silver_kbn_cloud_security_installation_stats` as cloud_security_installation_stats
    on cloud_account_stats.cluster_uuid = cloud_security_installation_stats.cluster_uuid
    and cloud_account_stats.timestamp = cloud_security_installation_stats.timestamp
    and cloud_account_stats.package_policy_id = cloud_security_installation_stats.package_policy_id
    and date(cloud_security_installation_stats.timestamp) > 

        -- For full refresh, default is date when backfill occured
        date("2023-11-16T00:00:00Z")

left outer join
    `elastic-security-staging`.`silver_cleansed_entities`.`silver_kbn_cloud_security_fleet_metadata` as cloud_security_fleet_metadata
    on cloud_account_stats.cluster_uuid = cloud_security_fleet_metadata.cluster_uuid
    and cloud_account_stats.timestamp = cloud_security_fleet_metadata.timestamp
    and date(cloud_security_fleet_metadata.timestamp) > 

        -- For full refresh, default is date when backfill occured
        date("2023-11-16T00:00:00Z")

left outer join
    `elastic-security-staging`.`gold_adoption`.`gold_security_solution_signals` as gold_security_solution_signals
    on cloud_account_stats.cluster_uuid = gold_security_solution_signals.cluster_uuid
    and date(cloud_account_stats.timestamp) = gold_security_solution_signals.partition_date
    and gold_security_solution_signals.partition_date > 

        -- For full refresh, default is date when backfill occured
        date("2023-11-16T00:00:00Z")

where
    date(cloud_account_stats.timestamp) > 

        -- For full refresh, default is date when backfill occured
        date("2023-11-16T00:00:00Z") 
        and sfdc_account_name is not null and customer_account_is_internal = false```
Omolola-Akinleye commented 3 months ago

Hey @kfirpeled This ticket is more complicated and will take more time to resolve. Should I move this ticket TODO?

Known issues