Open pilipyukaaa opened 1 month ago
Hi @pilipyukaaa , which version of Datahub you are using ?
Neo4j is certainly a bottleneck here. PRs to improve neo4j query performances. Check your version has these changes.
https://github.com/datahub-project/datahub/pull/10598/files https://github.com/datahub-project/datahub/pull/10593
Also create indexes for entities in neo4j if not created already. By default they are not getting created.
im not interesst
Den sön 20 okt. 2024 3:39 PMdeepgarg-visa @.***> skrev:
Hi @pilipyukaaa https://github.com/pilipyukaaa , which version of Datahub you are using ?
— Reply to this email directly, view it on GitHub https://github.com/datahub-project/datahub/issues/11671#issuecomment-2424966781, or unsubscribe https://github.com/notifications/unsubscribe-auth/BLQW5VENVKYX2IPQH2TKE6LZ4OXCJAVCNFSM6AAAAABQF3O5BGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUHE3DMNZYGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
hello, @deepgarg-visa i am using datahub version 0.13.3
i was update my datahub to 0.14.1 version and its still not good
2024-10-21 13:24:50,425 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook IngestionSchedulerHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,425 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook EntityChangeEventGeneratorHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,425 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook SiblingAssociationHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,425 [ThreadPoolTaskExecutor-1] INFO c.l.m.k.h.s.SiblingAssociationHook:121 - Urn urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD) with aspect datasetKey received by Sibling Hook.
2024-10-21 13:24:50,433 [ThreadPoolTaskExecutor-1] INFO c.l.m.k.h.s.SiblingAssociationHook:256 - Associating urn:li:dataset:(urn:li:dataPlatform:dbt,_dds.dist_5_dds_CRM_issues,PROD) and urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD) as siblings.
2024-10-21 13:24:50,438 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:97 - Successfully completed MCL hooks for consumer: generic-mae-consumer-job-client urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,439 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:69 - Invoking MCL hooks for consumer: generic-mae-consumer-job-client urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD), aspect name: siblings, entity type: dataset, change type: RESTATE
2024-10-21 13:24:50,439 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook FormAssignmentHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,439 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook UpdateIndicesHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,440 [ThreadPoolTaskExecutor-1] INFO c.l.m.s.e.update.ESBulkProcessor:82 - Added request id: BXh3SoWBZKt7JlYWQUbs+w==, operation type: UPDATE, index: system_metadata_service_v1
2024-10-21 13:24:50,441 [ThreadPoolTaskExecutor-1] INFO c.l.m.s.e.update.ESBulkProcessor:82 - Added request id: urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aclickhouse%2C_dds.dist_5_dds_crm_issues%2CPROD%29, operation type: UPDATE, index: datasetindex_v2
2024-10-21 13:24:50,473 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook IncidentsSummaryHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,474 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook IngestionSchedulerHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,474 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook EntityChangeEventGeneratorHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,474 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook SiblingAssociationHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,474 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:97 - Successfully completed MCL hooks for consumer: generic-mae-consumer-job-client urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,474 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:69 - Invoking MCL hooks for consumer: generic-mae-consumer-job-client urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD), aspect name: upstreamLineage, entity type: dataset, change type: RESTATE
2024-10-21 13:24:50,474 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook FormAssignmentHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,474 [ThreadPoolTaskExecutor-1] INFO c.l.metadata.kafka.MCLKafkaListener:79 - Invoking MCL hook UpdateIndicesHook for urn: urn:li:dataset:(urn:li:dataPlatform:clickhouse,_dds.dist_5_dds_crm_issues,PROD)
2024-10-21 13:24:50,478 [ThreadPoolTaskExecutor-1] INFO c.l.m.s.e.update.ESBulkProcessor:82 - Added request id: YxtQxbiPZDnFzc4S31sl0A==, operation type: UPDATE, index: system_metadata_service_v1
2024-10-21 13:24:50,478 [ThreadPoolTaskExecutor-1] INFO c.l.m.s.e.update.ESBulkProcessor:82 - Added request id: urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aclickhouse%2C_dds.dist_5_dds_crm_issues%2CPROD%29, operation type: UPDATE, index: datasetindex_v2
2024-10-21 13:24:51,778 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:61 - Successfully fed bulk request 198. Number of events: 10 Took time ms: 10
Hello, I have a problem with performance on process which consume messages from kafka and push changes in elasticsearch and neo4j i was added this envs to my gms
but performance is very low, can you help me find bottleneck?