Open xgraulap opened 1 week ago
Hello, @xgraulap! I'm here to help you with any bugs, questions, or contributions you have for Superset. Let's tackle this issue together!
To address the issue of performing calculations with duplicate dimensions in Superset, you can consider the following approach:
Enable Adhoc Subqueries: If you want to use subqueries in your metrics, you need to enable the 'ALLOW_ADHOC_SUBQUERY' feature in Superset. This can be done by adding the following configuration to your superset_config.py
file:
FEATURE_FLAGS = {
"ALLOW_ADHOC_SUBQUERY": True
}
This will allow you to use subqueries in your custom SQL fields, which can help in handling duplicate dimensions by allowing more complex SQL operations [1].
Create a View in the Database: Another approach is to create a view in your database that pre-aggregates or filters the data to remove duplicates. You can then use this view as a data source in Superset. This way, you can ensure that the calculations are performed on the correct set of unique data.
Use SQL Lab for Custom Queries: You can use SQL Lab in Superset to write custom SQL queries that handle duplicates as needed. Once you have the desired result, you can save the query as a virtual dataset and use it for your visualizations.
These methods should help you perform calculations over the total number of distinct Procedure IDs that meet your filters, ensuring accurate KPI results.
To continue talking to Dosu, mention @dosu.
@dosu could I add this command
FEATURE_FLAGS = { "ALLOW_ADHOC_SUBQUERY": True }
directly to the dashboard Advanced config?
@dosu and after adding this to my Superset config, how can I do subqueries on the metrics without getting an error when creating the charts?
After enabling the ALLOW_ADHOC_SUBQUERY
feature in your superset_config.py
, you can use subqueries in your metrics. However, you need to ensure that your SQL syntax is correct and that the subqueries are properly structured. Here are some steps to help you avoid errors:
Write the Subquery in SQL Lab: Start by writing and testing your subquery in SQL Lab to ensure it returns the expected results. This helps in debugging any SQL syntax issues before using it in a metric.
Use the Subquery in a Virtual Dataset: Once your subquery works in SQL Lab, you can save it as a virtual dataset. This allows you to use the results of the subquery as a data source for your charts.
Reference the Virtual Dataset in Metrics: When creating metrics, reference the virtual dataset instead of writing the subquery directly in the metric. This approach simplifies the metric definition and reduces the chance of errors.
By following these steps, you should be able to use subqueries in your metrics without encountering errors when creating charts.
To continue talking to Dosu, mention @dosu.
The chart is shown like this
SELECT CAST(SUM(CAST(numerator AS DECIMAL(10, 3))) / SUM(denominator) AS DECIMAL(10, 3)) FROM (SELECT DISTINCT procedure) AS "Is Joint Procurement" FROM indicatorfourancillary LIMIT 50000;
when I put this metric CAST( SUM(CAST(numerator AS DECIMAL(10,3))) / SUM(denominator) AS DECIMAL(10,3)) FROM (SELECT DISTINCT procedure)
but it's not correct @dosu
Could you provide more details about the error or issue you're encountering with this query? Are there any specific error messages or unexpected results? Additionally, could you clarify what you mean by "it's not correct"? This information will help in diagnosing the problem more effectively.
To continue talking to Dosu, mention @dosu.
If I use the metric wrote before, I get this error
Unexpected error Error: syntax error at or near "FROM" LINE 4: FROM indicatorfourancillary ^
So I want to know how to add the subquery @dosu
The error you're encountering is due to incorrect SQL syntax. In Superset, when defining a metric, you cannot directly include a FROM
clause within the metric definition itself. Metrics should be simple expressions that operate on the columns of the dataset.
To achieve your goal, you should first create a virtual dataset using SQL Lab with the subquery you want. Here's how you can do it:
Create a Virtual Dataset:
SELECT DISTINCT procedure, numerator, denominator
FROM indicatorfourancillary
Use the Virtual Dataset in a Chart:
SUM(numerator) / SUM(denominator)
without any subquery, since the virtual dataset already handles the distinct selection.By following these steps, you can effectively use subqueries in your analysis without encountering syntax errors in the metric definition.
To continue talking to Dosu, mention @dosu.
If I do this, I am not able to apply filters with my dimensions. This is not a good approach for my case @dosu I need to work with my actual dataset but using only DISTINCT procedures
If I do this, I am not able to apply filters with my dimensions. This is not a good approach for my case @dosu I need to work with my actual dataset but using only DISTINCT procedures
To work with your actual dataset while using only distinct procedures and still be able to apply filters on your dimensions, you can try the following approach:
Create a Derived Table in SQL Lab: Use SQL Lab to create a derived table that selects distinct procedures along with the necessary dimensions and metrics. For example:
SELECT DISTINCT procedure, dimension1, dimension2, numerator, denominator
FROM indicatorfourancillary
Ensure that this query includes all the dimensions you need for filtering.
Save as a Virtual Dataset: Save this query as a virtual dataset. This dataset will now have distinct procedures and include the dimensions you want to filter on.
Use the Virtual Dataset in a Chart: When creating a chart, use this virtual dataset as your data source. You can now apply filters on the dimensions as needed and define your metric as SUM(numerator) / SUM(denominator)
.
This approach allows you to maintain the distinct procedures while still being able to filter on your dimensions.
To continue talking to Dosu, mention @dosu.
This is what I want to evitate in fact @dosu
I put this command in my config
FEATURE_FLAGS = { "ALLOW_ADHOC_SUBQUERY": True }
in order to be able to do SUBSELECT in my metrics. Could you help me on how to add this unicity directly in my metric?
In Superset, even with the ALLOW_ADHOC_SUBQUERY
feature enabled, you cannot directly include a FROM
clause or subquery within a metric definition. Metrics are designed to be simple expressions that operate on the columns of the dataset.
To achieve the desired unicity directly in your metric, you can use a workaround by leveraging SQL Lab to create a virtual dataset that incorporates the distinct logic. Here's a refined approach:
Create a Virtual Dataset with Distinct Logic:
SELECT DISTINCT procedure, dimension1, dimension2, numerator, denominator
FROM indicatorfourancillary
Save as a Virtual Dataset:
Use the Virtual Dataset in a Chart:
SUM(numerator) / SUM(denominator)
without any subquery, since the virtual dataset already handles the distinct selection.This method allows you to maintain distinct procedures while still being able to filter on your dimensions, effectively achieving the unicity you need in your metrics.
To continue talking to Dosu, mention @dosu.
@dosu can I use the GROUP BY in a BIG NUMBER chart?
Yes, you can use the GROUP BY
clause in a Big Number chart in Superset. However, keep in mind that Big Number charts are typically used to display a single aggregated value. If you use GROUP BY
, it will generate multiple rows, which might not be suitable for a Big Number chart.
If you need to aggregate data with GROUP BY
and still want to display a single value, consider using a subquery or a virtual dataset to pre-aggregate the data. Then, select the specific aggregated value you want to display in the Big Number chart.
To continue talking to Dosu, mention @dosu.
@dosu how can I use this GROUP BY?
To use GROUP BY
in a Big Number chart, you need to pre-aggregate your data before using it in the chart. Here's how you can do it:
Use SQL Lab:
GROUP BY
clause to aggregate your data as needed.SELECT procedure, SUM(numerator) AS total_numerator, SUM(denominator) AS total_denominator
FROM indicatorfourancillary
GROUP BY procedure
Save as a Virtual Dataset:
Create a Big Number Chart:
SUM(total_numerator) / SUM(total_denominator)
.By pre-aggregating your data with GROUP BY
in SQL Lab and using a virtual dataset, you can effectively use the results in a Big Number chart.
To continue talking to Dosu, mention @dosu.
I'm not getting a response fitting my need. Could you help @betodealmeida @rusackas @villebro ?
Bug description
Good afternoon, I have a problem when using Superset to perform calculations. I have my data in a unique table, with Procedure ID as Surrogate Key and some dimensions adding information of the procedures. Also I have numerator and denominator calculated for every Procedure ID to get the result of the KPI using the following metric:
sum(numerator)/sum(denominator).
The thing is that if I do the calculation like this, the calculation is done without taking into account duiplications in the dimensions, so the results become incorrect.
The issue is that Superset doesen't allow me to add SUBSELECT to the metrics, so I can't perform my calculations over the total number of different Procedure ID that accomplish my filters.
I proposed a metric like this:
SUM(numerator)/SUM(denominator) FROM ( SELECT DISTINCT procedure, numerator, denominator FROM indicatorfourancillary ) AS unique_procedures
But it doesn't meet my expectations.
So I would like to know if in Superset, there's a way of doing first a SELECT of the unique procedure ID that meet my filters, and then perform the calculations over them.
Thank you so much in advance, @villebro, @rusakas y @betodealmeida
Screenshots/recordings
This is my actual state, this LOT should give me 0/1 and now I'm getting 0/4.
Superset version
master / latest-dev
Python version
I don't know
Node version
16
Browser
Chrome
Additional context
No response
Checklist