Azure-Samples / modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
MIT License
590 stars 462 forks source link

[D1] - Extract and Utilize System Data #723

Open ydaponte opened 2 months ago

ydaponte commented 2 months ago

As a Developer, I want to gather and utilize data from System Views in Microsoft Fabric to monitor system performance and query activity so that we can effectively score and tag stale content.

Resources

Queries:

Acceptance Criteria:

andreiafig-msft commented 2 months ago

STUDYING OUR DATA SOURCE - SYSTEM VIEWS IN DATAWAREHOUSE

What are we considering as a stale content in terms of Data warehouse query and performance? The system views will provide information about:

Query Performance Analysis

Query Optimization and Tuning

User Activity Monitoring

With the above information, customer will benefit from creating reports for analysis and statistics but not for archival.

In the Fabric Warehouse, we have the Query Insights that is automatically created. Query insights - Microsoft Fabric | Microsoft Learn

Comparing the feature Query Insights with the system views, there are some similarities (or one is complementary to the other). From what I could understand, the Query Insights is using the DMV's and returns a more complete information. The same if happening if we consider the Query Activity.

Query insights - Microsoft Fabric | Microsoft Learn Monitor connections, sessions, and requests using DMVs - Microsoft Fabric | Microsoft Learn Monitor your running and completed queries using Query activity - Microsoft Fabric | Microsoft Learn

The fact that you are able to find the same information in these different tools is highlighted in the Query Activity documentation.

As per my experience, the customer will always want to have access to this data for monitoring and tracking purposed. There is no benefit in archiving this data specially because we only save the data for the past 30 days (this is for the Query Insights, Query Activity).

For the DMV's, I am not sure about the limits but I would say it is same as the ones in Synapse. Please refer to the following documentation: System views supported in Synapse SQL - Azure Synapse Analytics | Microsoft Learn Capacity limits for dedicated SQL pool - Azure Synapse Analytics | Microsoft Learn

SUMMARY: • We need to evaluate what we would consider as a stale content for the Data Warehouse system views and query execution. • From my perspective, there is no need to identify any content here as stale or with the need of archival because this will be done automatically (the telemetry will expire after 30 days and will have a limit for 10.000 records, for example).

NEXT STEPS: Try to correlate the metadata information with the query text to verify if a table has not been used. (For example) - This will give us the information about idle resources.